Capstone Project
update in 2017-04-8
https://classroom.udacity.com/nanodegrees/nd009/parts/5375cf82-14fe-422d-b737-7bc893e20a6d
stay tuned…
[TOC]
Welcome to the Nanodegree
Get started with learning about your Nanodegree. Introduction to Decision Trees, Naive Bayes, Linear and Logistic Regression and Support Vector Machines. You can join the MLND student community by following this link and registering your email - https://mlnd-slack.udacity.com
WELCOME TO THE NANODEGREE
Welcome to MLND
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HG5IYufgDAo.mp4
Program Readiness
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/dc9CmcGTnr0.mp4
What is Machine Learning?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/K45QM8Wi7BU.mp4
Machine Learning vs. Traditional Coding
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_N2iIB_bLXA.mp4
Applications of Machine Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kIM5D_W6Mh8.mp4
Connections to GA Tech
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/DysCmGKRpvs.mp4
Program Outline
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/m0cIDrRWyLw.mp4
What is ML
Introduction to Machine Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bYeteZQrUcE.mp4
Decision Trees
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1RonLycEJ34.mp4
Decision Trees Quiz
QUIZ QUESTION
Between Gender and Age, which one seems more decisive for predicting what app will the users download?
- Gender
- Age
Decision Trees Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/h8zH47iFhCo.mp4
Naive Bayes
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jsLkVYXmr3E.mp4
Naive Bayes Quiz
QUIZ QUESTION
If an e-mail contains the word “cheap”, what is the probability of it being spam?
40%
60%
80%
Naive Bayes Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YKN-fjuZ1VU.mp4
Gradient Descent
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/BEC0uH1fuGU.mp4
Linear Regression Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sf51L0RN6zc.mp4
QUIZ QUESTION
What’s the best estimate for the price of a house?
80k
120k
190k
SUBMIT
Linear Regression Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/L5QBqYDNJn0.mp4
Logistic Regression Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wQXKdeVHTmc.mp4
QUIZ QUESTION
Does the student get Accepted?
Yes
No
SUBMIT
Logistic Regression Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/JuAJd9Qvs6U.mp4
Support Vector Machines
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Fwnjx0s_AIw.mp4
Support Vector Machines Quiz
QUIZ QUESTION
Which one is a better line?
The yellow lineThe blue line
SUBMIT
Support Vector Machines Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/JrUtTwfnsfM.mp4
Neural Networks
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xFu1_2K2D2U.mp4
Kernel Method
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/x0JqH6-Dhvw.mp4
Kernel Method Quiz
QUIZ QUESTION
Which equation could come to our rescue?
x+y
xyx^2
SUBMIT
Kernel Method Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/dRFd6HaAXys.mp4
Recap and Challenge
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ecREasTrKu4.mp4
K-means Clustering
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/pv_i08zjpQw.mp4
Hierarchical Clustering
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1PldDT8AwMA.mp4
Practice Project: Detect Spam
Practice Project: Using Naive Bayes to detect spam.
From time to time you will be encouraged to work on practice projects which are aimed at deepening your understanding of the concepts being taught. In this practice project, you will be implementing the Naive Bayes algorithm to detect spam text messages(as taught by Luis earlier in the lesson) from an open source dataset.
Here is the notebook, the solutions are included.
Summary
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hJEuaOUu2yA.mp4
MLND Program Orientation
Before the Program Orientation
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/73CdKtS-IwU.mp4
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/fxNSn63xFvA.mp4
Projects and Progress
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Z9ZLMQWsbsk.mp4
Career Development
Being enrolled in one of Udacity’s Nanodegree programs has many careers-based perks. Our goal is to help you take your learning from this program and apply it in the real world and in your career.
As you venture through the Machine Learning Engineer Nanodegree program, you’ll have the opportunity to
- Update your resume through a peer-reviewed process using conventions that recruiters expect and get tips on how to best represent yourself to pass the “6 second screen”;
- Create a cover letter that portrays your soft and hard skills, and most importantly your passion for a particular job that you are interested in applying to;
- Get your GitHub and LinkedIn profiles reviewed through the lens of a recruiter or hiring manager, focusing on how your profile, projects, code, education and past experiences represent you as a potential candidate;
- Practice for a technical interview with a professional reviewer on a variety of topics;
- And more!
You can also find career workshops that Udacity has hosted over the years, where you can gain a plethora of information to prepare you for your ventures into a career. Udacity also provides job placement opportunities with many of our industry partners. To take advantage of this opportunity, fill out the career section of your Udacity professional profile, so we know more about you and your career goals! If all else fails, you can always default to emailing the career team at career-support@udacity.com.
Connecting with Your Community
Your Nanodegree community will play a huge role in supporting you when you get stuck and in helping you deepen your learning. Getting to know your fellow students will also make your experience a lot more fun!
To ask and answer questions, and to contribute to discussions, head to your program forum. You can get there by clicking the Discussion link in the classroom and in the Resources tab in your Udacity Home. You can search to see if someone has already asked a question related to yours, or you can make a new post if no one has. Chances are, someone else is wondering about the same thing you are, so don’t be shy!
In addition, students may connect with one another through Slack, a team-oriented chat program. You can join the MLND Slack student community by following this link and registering your email. There are many content-related channels where you can speak with students about a particular concept, and even discuss your first week in the program using the #first-week-experience
channel. In addition, you can talk with MLND graduates and alumni to get a live perspective on the program in the #ask-alumni
channel! You can find the student-run community wiki here.
Support from the Udacity Team
The Udacity team is here to help you reach your Nanodegree program goals! You can interact with us in the following ways:
- Forums: Along with your student community, the Udacity team maintains a strong presence in the forum to help make sure your questions get answered and to connect you with other useful resources.
- 1-on-1 Appointments: If you get stuck working on a project in the program, our mentors are here to help! You can set up a half-hour appointment with a mentor available for the project at a time you choose to get assistance.
- Project Reviews: During the project submission process, your submissions will be reviewed by a qualified member of our support team, who will provide comments and helpful feedback on where your submission is strongest, and where your submission needs improvement. The reviews team will support your submissions all the way up to meeting specifications!
- By email: You can always contact the Machine Learning team with support-related questions using machine-support@udacity.com. Please make sure that you have exhausted all other options before doing so!
Find out more about the support we offer using the Resources tab in your Udacity Nanodegree Home.
How Does Project Submission Work?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jCJa_VP6qgg.mp4
Integrity and Mindset
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/zCOr3O50gQM.mp4
How Do I Find Time for My Nanodegree?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/d-VfUw7wNEQ.mp4
All calendar applications now let you set up a weekly reminder. I have included a screen capture below of how to set one up in Google Calendar. We recommend coming into the classroom at least twice a week. It is a best practice to set up at least one repeating weekly reminder to continue the Nanodegree program.
Final Tips
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1ZVBvM54hQw.mp4
Wrapping Up the Program Orientation
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xujb3Rqxuog.mp4
You now have all the info you need to proceed on your Nanodegree journey!
- If you have any further questions, perhaps about payment or enrollment, read your Nanodegree Student Handbook for more details.
- Download Udacity’s mobile app to learn on the go!
- Remember to put in time consistently, engage with your community, take advantage of the resources available to you, and give us feedback throughout the program.
We are so glad to have you with us! Return to your Udacity home to keep learning. Good luck!
(Optional) Exploratory Project
Software Requirements
windows + R
and- type
pip install --user pandas jupyter
, - oops,
error: Microsoft Visual C++ 9.0 is required. Get it from http://aka.ms/vcpython27
- download and install
- successfully
Starting the Project
First try
windows + R
- type
cd <path>
my<path>
isG:\Udacity\MLND\machine-learning-master\projects\titanic_survival_exploration
- type
<path>
g:
- type
bash jupyter notebook titanic_survival_exploration.ipynb
,show'bash' 不是内部或外部命令,也不是可运行的程序
- Failed
Second try
- open
git bash
- cd
<path>
with/``G:/Udacity/MLND/machine-learning-master/projects/titanic_survival_exploration
- type
bash jupyter notebook titanic_survival_exploration.ipynb
- failed
Third try
- install
Anaconda
windows + R
- type
cd <path>
my<path>
isG:\Udacity\MLND\machine-learning-master\projects\titanic_survival_exploration
- type
<path>``g:
- type
jupyter notebook titanic_survival_exploration.ipynb
- done
Fourth try
- open
git bash
- cd
<path>
with/``G:/Udacity/MLND/machine-learning-master/projects/titanic_survival_exploration
- type
jupyter notebook titanic_survival_exploration.ipynb
- done
ipython notebook
Question 4(stay tuned):
Pclass
== 3
Career: Orientation
Throughout your Nanodegree program, you will see Career Development Lessons and Projects that will help ensure you’re presenting your new skills best during your job search. In this short lesson, meet the Careers team and learn about the career resources available to you as a Nanodegree student.
If you are a Nanodegree Plus student, Career Content and Career Development Projects are required for graduation.
If you are enrolled in a standard Nanodegree program, Career Content and Career Development Projects are optional and do not affect your graduation.
ORIENTATION
Career Services Available to You
Meet the Careers Team
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oR1IxPTTz0U.mp4
Resources
Your Udacity Profile
Connect to Hiring Partners through your Udacity Professional Profile
In addition to the Career Lessons and Projects you’ll find in your Nanodegree program, you have a Udacity Professional Profile linked in the left sidebar.
Your Udacity Professional Profile features important, professional information about yourself. When you make your profile public, it becomes accessible to our Hiring Partners, as well as to recruiters and hiring managers who come to Udacity to hire skilled Nanodegree graduates.
As you complete projects in your Nanodegree program, they will be automatically added to your Udacity Professional Profile to ensure you’re able to show employers the skills you’ve gained through the program. In order to differentiate yourself from other candidates, make sure to go in and customize those project cards. In addition to these projects, be sure to:
- Keep your profile updated with your basic info and job preferences, such as location
- Ensure you upload your latest resume
- Return regularly to your Profile to update your projects and ensure you’re showcasing your best work
If you are looking for a job, make sure to keep your Udacity Professional Profile updated and visible to recruiters!
EDIT YOUR PROFILE NOW !
Model Evaluation and Validation
Apply statistical analysis tools to model observed data, and gauge how well your models perform.
Project: Predicting Boston Housing Prices
For most students, this project takes approximately 8 - 15 hours to complete (about 1 - 3 weeks).
P1 Predicting Boston Housing Prices
STATISTICAL ANALYSIS
Intro: Model Evaluation and Validation
Intro to Model Evaluation and Validation
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cseqEWRDs5Q.mp4
Model Evaluation What You’ll Watch
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jYZO17CeZDI.mp4
Model Evaluation What You’ll Learn
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ZLOucNwuqCk.mp4
Course Outline - Fitting It All Together
The ultimate goal of Machine Learning is to have data models that can learn and improve over time. In essence machine learning is making inferences on data from previous examples.
In this first section we review some basic statistics and numerical tools to manipulate & process our data.
Then we will move on to modeling data; reviewing different data types and seeing how they play out in the case of one specific dataset. The section ends by introducing the basic tool of a supervised learning algorithm.
Next, we’ll see how to use our dataset for both training and testing data, and review various tools for how to evaluate how well an algorithm performs.
Finally, we’ll look at the reasons that errors arise, and the relationship between adding more data and adding more complexity in getting good predictions. The last section ends by introducing cross validation, a powerful meta-tool for helping us use our tools correctly.
Model Evaluation What You’ll Do
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kJCAuHjWiOA.mp4
Prerequisites
Statistics Review & Supporting Libraries
In this section we will go over some prerequisites for this course, review basic statistics concepts and problem sets, and finally teach you how to use some useful data analysis Python libraries to explore real-life datasets using the concepts you reviewed earlier.
Prerequisites
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0ANDJ8i_deE.mp4
Here are shortcuts to the prerequisite statistics courses:
Udacity’s descriptive stats course [link]
Udacity’s inferential stats course [link]
You will also need to have just a little bit of git experience — enough to check out our code repository. If you’ve ever used git before, you should be fine. If this is truly your first time with git, once you get to the first mini-project, you may want to quickly look at the first lesson of Udacity’s git course.
mode
meanvariance``standard deviation
Bessel's Correction
usen-1
instead ofn
sample SD
Measures of Central Tendency
Introduction: Topics Covered
Measures of Central Tendency
In this lesson, we will cover the following topics:
- Mean
- Median
- Mode
- This lesson is meant to be a refresher for those who have no statistics background and therefore if you are familiar with these concepts you may skip this lesson.
Which Major?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mIzPoh_kqw4.mp4
Quiz: Which Major?
Enter your answers as a number with no commas or symbols ($). Enter the number in thousands (5 digits)
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/t0yIetl9ZxI.mp4
One Number to Describe Data
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6QfvhQ0En0E.mp4
Which Number to Choose?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7E7Czixpviw.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TSw2AAaKxBA.mp4
Mode of Dataset
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/80BAbiEWsaY.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HvOjcTFlVTI.mp4
Mode of Distribution
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/47JDwoDUxP8.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/s9dHF4MMGx0.mp4
Mode - Negatively Skewed Distribution
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xBnhUJENAtk.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oNRtSJvtkJc.mp4
Mode - Uniform Distribution
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TE2BZql64XY.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mFm0FfWHlXw.mp4
More than One Mode?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1GuHNqJNY2M.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/9bsmO7cKzPk.mp4
Mode of Categorical Data
Quiz
The data doesn’t need to be numeric to find a mode: we can also compute the mode for categorical data as well! On the next slide, you’ll be asked to find the mode of a categorical data set: the preferred M&M flavor of 8,000 Udacity students.
START QUIZ
Answer
Remember, the mode occurs on the X-axis, so you are looking for whatever value has the highest frequency.The numbers 7,000 and 1,000 are the actual frequencies. The mode, itself, is “Plain.”
More o’ Mode!
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wBduq7St2Ak.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/74mudn321tA.mp4
Find the Mean
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/S4KbzIyEwV8.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/El5cY2jlzuM.mp4
Procedure for Finding Mean
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nRurXCTYxG4.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lK2lDLdE6iA.mp4
Iterative Procedure
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/r2s9INGd-Ls.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/NcWS_BvM3IU.mp4
Helpful Symbols
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/w4_8YCp-9fI.mp4
Properties of the Mean
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/uSXJtEpwEVM.mp4
Quiz: Properties Of The Mean
Please note: The last option “The mean will change if we add an extreme value to the dataset.” is not necessarily a property of the mean, more a behavioral tendency. But for the purposes of this quiz, you can mark it as a property
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AqlvTMZg6HY.mp4
Mean with Outlier
What Can You Expect?
UNC
Requirement for Median
Find the Median
Median with Outlier
Find Median with Outlier
Measures of Center
Order Measures of Center 1
Order Measures of Center 2
Use Measures of Center to Compare
Link to poll: How many Facebook friends do you have?
Link to poll results
Mashable: Who is an Average Facebook User?
Zeebly Social Me (optional but fun use of statistics)
Udacians’ Facebook Friends - Mean
Link to Udacians’ Facebook Friends
Copy and paste the data into your own spreadsheet to perform the calculations. From Google Drive (at the top of the page once you’re signed in to your Google account), click the button on the left that says “CREATE” and click “Spreadsheet.” Please round your answer to two decimal places.
Udacians’ Facebook Friends - Median
Link to Udacians’ Facebook Friends
Copy and paste the data into your own spreadsheet to perform the calculations. From Google Drive (at the top of the page once you’re signed in to your Google account), click the button on the left that says “CREATE” and click “Spreadsheet.”
Formula for Location of Median
Wrap Up - Measures of Center
Quiz: Wrap Up - Measures Of Center
Here is a short doc outlining Mean, Median, and Mode. http://tinyurl.com/measureOfCenter
Good Job!
Variability of Data
Introduction: Topics Covered
In this lesson, we will cover the following topics:
- Inter Quartile Range
- Outliers
- Standard Deviation
- Bessel’s Correction
This lesson is meant to be a refresher for those who have no statistics background and therefore if you are familiar with these concepts you may skip this lesson.
Social Networkers’ Salaries
Should You Get an Account?
What’s the Difference?
Quantify Spread
Does Range Change?
Mark Z the Outlier
Chop Off the Tails
Where Is Q1?
Q3 - Q1
IQR
What Is an Outlier?
Define Outlier
Match Boxplots
Mean Within IQR?
Problem with IQR
Measure Variability
Calculate Mean
Deviation from Mean
Average Deviation
Equation for Average Deviation
Be Happy and Get Rid of Negatives
Absolute Deviations
Average Absolute Deviation
Formula for Avg. Abs. Dev.
Squared Deviations
Sum of Squares
Average Squared Deviation
####
####
####
####
####
####
####
####
Numpy&Pandas Tutorials
Numpy and Pandas Tutorials
Now that you reviewed some basic statistics, lets go over some Python libraries that allow you to explore data and process large datasets.
Specifically we will go over numpy which will allow us to process large amount of numerical data and panda series and dataframes which allow us to store large datasets and extract information from them.
Numpy Library Documentation: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
Pandas Library Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/
We highly recommend going through this resource by Justin Johnson if you have not worked with Numpy before.
Another great resource is the SciPy-lectures series on this topic.
Numpy
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/l_Tzjxfa_5g.mp4
Numpy Playground
Here’s the code
import numpy as np
'''
The following code is to help you play with Numpy, which is a library
that provides functions that are especially useful when you have to
work with large arrays and matrices of numeric data, like doing
matrix matrix multiplications. Also, Numpy is battle tested and
optimized so that it runs fast, much faster than if you were working
with Python lists directly.
'''
'''
The array object class is the foundation of Numpy, and Numpy arrays are like
lists in Python, except that every thing inside an array must be of the
same type, like int or float.
'''
# Change False to True to see Numpy arrays in action
if False:
array = np.array([1, 4, 5, 8], float)
print array
print ""
array = np.array([[1, 2, 3], [4, 5, 6]], float) # a 2D array/Matrix
print array
'''
You can index, slice, and manipulate a Numpy array much like you would with a
a Python list.
'''
# Change False to True to see array indexing and slicing in action
if False:
array = np.array([1, 4, 5, 8], float)
print array
print ""
print array[1]
print ""
print array[:2]
print ""
array[1] = 5.0
print array[1]
# Change False to True to see Matrix indexing and slicing in action
if False:
two_D_array = np.array([[1, 2, 3], [4, 5, 6]], float)
print two_D_array
print ""
print two_D_array[1][1]
print ""
print two_D_array[1, :]
print ""
print two_D_array[:, 2]
'''
Here are some arithmetic operations that you can do with Numpy arrays
'''
# Change False to True to see Array arithmetics in action
if False:
array_1 = np.array([1, 2, 3], float)
array_2 = np.array([5, 2, 6], float)
print array_1 + array_2
print ""
print array_1 - array_2
print ""
print array_1 * array_2
# Change False to True to see Matrix arithmetics in action
if False:
array_1 = np.array([[1, 2], [3, 4]], float)
array_2 = np.array([[5, 6], [7, 8]], float)
print array_1 + array_2
print ""
print array_1 - array_2
print ""
print array_1 * array_2
'''
In addition to the standard arthimetic operations, Numpy also has a range of
other mathematical operations that you can apply to Numpy arrays, such as
mean and dot product.
Both of these functions will be useful in later programming quizzes.
'''
if True:
array_1 = np.array([1, 2, 3], float)
array_2 = np.array([[6], [7], [8]], float)
print np.mean(array_1)
print np.mean(array_2)
print ""
print np.dot(array_1, array_2)
Pandas
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8ay7tX26YxE.mp4
Pandas Playground – Series
Here’s the code
Pandas Playground – Dataframe
Here’s the code
Create a DataFrame
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hMxOuWJaVDA.mp4
Here is a link to the pandas documentation.
Here’s also an excellent series of tutorials as IPython notebooks
(Thank you to Dominique Luna for sharing!)
Also note: you do not need to use pandas.Series
, you can pass in python lists as the values in this case:
olympic_medal_counts_df = DataFrame(
{'country_name': countries,
'gold': gold,
'silver': silver,
'bronze': bronze})
|
|
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-6Zw3Y4iXRY.mp4
Dataframe Columns
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/G4gFryXnrr8.mp4
Pandas Playground - Indexing Dataframes
Here’s the code
Pandas Vectorized Methods
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hvSEZxcH9PM.mp4
As a refresher on lambda, lambda functions are small inline functions that are defined on-the-fly in Python. lambda x: x>= 1
will take an input x and return x>=1, or a boolean that equals True
or False
.
In this example, map()
and applymap()
create a new Series or DataFrame by applying the lambda function to each element. Note that map()
can only be used on a Series to return a new Series and applymap()
can only be used on a DataFrame to return a new DataFrame.
For further reference, please refer to the official documentation on lambda:
Average Bronze Medals
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AjniaYFfCeg.mp4
You might find using “boolean indexing“ helpful for this problem.
Here is a link to the pandas documentation.
Here’s also an excellent series of tutorials as IPython notebooks
(Thank you to Dominique Luna for sharing!)
|
|
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kibtHtPgWvs.mp4
Average Gold, Silver, and Bronze Medals
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/19b96U6dLtY.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HbCmZuVp548.mp4
Matrix Multiplication and Numpy Dot
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/yAgqlTfWc9E.mp4
The second line in the green vector on the top line starting at 1:15 should read: “1 x 4 + 2 x 5”.
This vector should also be a row vector (1 x 3 matrix) instead of a column vector (3 x 1 matrix).
You can read more about numpy.dot or matrix multiplication with numpy below:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
Olympics Medal Points
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/uKvbguVQYh4.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/EQMUh4Id4Po.mp4
- numpy which will allow us to process large amount of numerical data.
- pandas which allow us to store large datasets and extract information from them.
- Numpy Playground
- Pandas Playground – Series
- Pandas Playground – Dataframe
- Pandas Playground - Indexing Dataframes
map()
andapplymap()
create a new Series or DataFrame by applying the lambda function to each element. Note thatmap()
can only be used on a Series to return a new Series andapplymap()
can only be used on a DataFrame to return a new DataFrame.- Here is a link to the pandas documentation.
- Here’s also an excellent series of tutorials as IPython notebooks
- intro-to-pandas-data-structures
DATA MODELING
scikit-learn Tutorial
windows + r
- type
pip show scikit-learn --version
,I have installed scikit-learn - type
pip install --upgrade 'scikit-learn>=0.17,<0.18'
It shows系统找不到指定的文件。
- type
conda -c install scikit-learn=0.17
It showsconda-script.py: error: unrecognized arguments: -c
- type
- open
git bash
- type
pip install --upgrade 'scikit-learn>=0.17,<0.18'
,success - type
conda -c install scikit-learn=0.17
failedconda-script.py: error: unrecognized arguments: -c
- type
Introduction to scikit-learn
Scikit-Learn
Scikit-learn is an open source Machine Learning library that is built on NumPy, SciPy and matplotlib. It uses a Python interface and supports various regression, classification and clustering algorithms. You will be using this library throughout this program to implement in your projects.
Example using Scikit-learn
To show you how we can leverage sklearn(short for scikit-learn), here is an example of a simple Linear Regression Classifier being used to make predictions on the Boston housing prices dataset which comes as a preloaded dataset with sklearn. We will not dive deep into the dataset per se, nor will we split our dataset into training and testing splits just yet(you will learn about the importance of this in the next lesson), the goal of this node is to give you a high level view of how with just a few lines of code, you can make predictions on a dataset using the sklearn tool.
This data sets consists of 506 samples with a dimensionality of 13. We will run a Linear Regression classifier on the feature set to make predictions on the prices.
We start by getting the necessary imports.
from sklearn import datasets # sklearn comes with a variety of preloaded datasets
from sklearn import metrics # calculate how well our model is doing
from sklearn.linear_model import LinearRegression
There are several ways in which we can load datasets in sklearn. For now, we will start the most basic way using a dataset which is pre loaded.
# Load the dataset
housing_data = datasets.load_boston()
We now define the model we want to use and herein lies one of the main advantages of using this library.
linear_regression_model = LinearRegression()
Next, we can fit our Linear Regression model on our feature set to make predictions for our labels(the price of the houses). Here, housing_data.data is our feature set and housing_data.target are the labels we are trying to predict.
linear_regression_model.fit(housing_data.data, housing_data.target)
Once our model is fit, we make predictions as follows:
predictions = linear_regression_model.predict(housing_data.data)
Lastly, we want to check how our model does by comparing our predictions with the actual label values. Since this is a regression problem, we will use the r2 score metric. You will learn about the various classification and regression metrics in future lessons.
score = metrics.r2_score(housing_data.target, predictions)
And there we have it. We have trained a regression model on a dataset and calculated how well our model does all with just a few lines of code and with all the math abstracted from us. In the next nodes, we will walk you through installing sklearn on your system, and you will work with Katie on a sample problem.
scikit-learn Installation
scikit-learn Installation
First, check that you have a working python installation. Udacity uses python 2.7 for our code templates and in-browser exercises.
We recommend using pip to install packages. First get and install pip from here. If you are using Anaconda, you can also use the conda command to install packages.
- To install scikit-learn via pip or anaconda:
- open your terminal (terminal on a mac or cmd on a PC)
- install sklearn with the command:
pip install scikit-learn
orconda install scikit-learn
- If you do not use pip or conda, further installation instructions can be found here.
Important note about scikit-learn
versioning
scikit-learn
has recently come out with a stable release of its library with version v0.18. With this version comes a few changes to some of the functions we will talk about extensively in this course, such as train_test_split
, gridSearchCV
, ShuffleSplit
, and learning_curves
. The documentation available on scikit-learn’s website will reference v0.18, however Katie, Udacity’s quizzes, and our projects, are still written in v0.17. Please make sure that when using the documentation and scikit-learn
, you reference version v0.17 and not version v0.18. In the near future, we will be updating our content to match the most current version.
Please see this forum post that provides more detail on this topic. If you have any additional questions or concerns, feel free to discuss them in the forums or email machine-support@udacity.com.
If you’ve accidentally installed version v0.18 through pip
, not to worry! Use the command below to downgrade your scikit-learn
version to v0.17
:
pip install --upgrade 'scikit-learn>=0.17,<0.18'
If you are using the Anaconda distribution of Python and have scikit-learn
installed as version v0.18, you can also use the command below to downgrade your scikit-learn
version to v0.17
:
conda -c install scikit-learn=0.17
scikit-learn Code
In this next section Katie will walk through using the scikit-learn (or sklearn) documentation with a Gaussian Naive Bayes model. For this exercise it is not important to know all of the details of Naive Bayes or the code Katie is demonstrating. Focus on taking in the basic layout of sklearn, which we can then use to evaluate and validate any data model.
We will cover Naive Bayes along with other useful supervised models in much more detail in the upcoming Supervised Machine Learning course and use what we learn from this course to evaluate each model’s strengths and weaknesses.
If you want a sneak peak into Naive Bayes, you can check out the documentation here.
Getting Started With sklearn
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/olGPVtH7KGU.mp4
Gaussian NB Example
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wpnDwiqTCJA.mp4
GaussianNB Deployment on Terrain Data
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/VBs6D4ggnYY.mp4
Quiz: GaussianNB Deployment On Terrain Data
To find theClassifyNB.py
script that you need to update for the quiz, you can click on the dropdown in the classroom code editor to get a list of files that will be used.In the quiz that follows, the line that reads
pred = clf.predict(features_test)
is not necessary for drawing the decision boundary, at least as we’ve written the code.However, the whole point of making a classifier is that you can make predictions with it, so be sure to keep it in mind since you’ll be using it in the quiz after this one.
|
|
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TcSnd3_hAy8.mp4
Nature of Data
- one-hot encoding
- This article describes seven different possible encodings of categorical data.
- find the answer of One-Hot Encoding in forum that I have written
- Enron Email Dataset
Data Types 1 - Numeric Data
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xmuWRRTPS4k.mp4
Data Types 2 - Categorical Data
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/uc_FRItxRMs.mp4
Data Types 3 - Time Series Data
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6OmiG5zzZoA.mp4
Treatment of categorical data
Many algorithms assume that input data is numerical. For categorical data, this often means converting categorical data into numerical data that represents the same patterns.
One standard way of doing this is with one-hot encoding. There are built-in methods for this in scikit-learn.
Essentially, a categorical feature with 3 possible values is converted into three binary features corresponding to the values. The new feature corresponding to the category a datum belongs to has value 1, while the other new features have value 0.
For example, in a dataset on baseball players, one feature might be “Handedness” which can take values “left” or “right”. Then the data:
- Joe
- Handedness: right
Jim
- Handedness: left
Would become:
- Handedness: left
Joe
- Handedness/right: 1
- Handedness/left: 0
- Jim
- Handedness/right: 0
- Handedness/left: 1
For ordinal data, it often makes sense to simply assign the values to integers. So the following data:
Joe
Skill: low
Jim
Skill: medium
Jane
Skill: high
Would become:
Joe
Skill: 0
Jim
Skill: 1
Jane
Skill: 2
These approaches are not all that is possible. However in general these simple approaches suffice, and if there is a reason to use another encoding that will be subject to the nature of the data.
Encoding using sklearn
Encoding in sklearn is done using the preprocessing module which comes with a variety of options of manipulating data before going into the analysis of data. We will focus on two forms of encoding for now, the LabelEncoder and the OneHotEncoder.
Label Encoder
First, we have to import the preprocessing library.
from sklearn import preprocessing
Let’s create a dummy dataframe named data with a column whose values we want to transform from categories to integers.
# creating sample data
sample_data = {'name': ['Ray', 'Adam', 'Jason', 'Varun', 'Xiao'],
'health':['fit', 'slim', 'obese', 'fit', 'slim']}
# storing sample data in the form of a dataframe
data = pandas.DataFrame(sample_data, columns = ['name', 'health'])
We have 3 different labels that we are looking to categorize: slim, fit, obese. To do this, we will call LabelEncoder()
and fit it to the column we are looking to categorize.
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(data['health'])
Once you have fit the label encoder to the column you want to encode, you can then transform that column to integer data based on the categories found in that column. That can be done as follows:
label_encoder.transform(data['health'])
This will give you the output:
array([0, 2, 1, 0, 2])
You can combine the fit
and transform
statements above by using label_encoder.fit_transform(data['health'])
.
The string categorical health data has been mapped as follows:
fit : 0
obese: 1
slim: 2
One thing to keep in mind when encoding data is the fact that you do not want to skew your analysis because of the numbers that are assigned to your categories. For example, in the above example, slim is assigned a value 2 and obese a value 1. This is not to say that the intention here is to have slim be a value that is empirically twice is likely to affect your analysis as compared to obese. In such situations it is better to one-hot encode your data as all categories are assigned a 0 or a 1 value thereby removing any unwanted biases that may creep in if you simply label encode your data.
One-hot Encoder
If we were to apply the one-hot transformation to the same example we had above, we’d do it in Pandas using get_dummies as follows:
pandas.get_dummies(data['health'])
We could do this in sklearn on the label encoded data using OneHotEncoder as follows:
ohe = preprocessing.OneHotEncoder() # creating OneHotEncoder object
label_encoded_data = label_encoder.fit_transform(data['health'])
ohe.fit_transform(label_encoded_data.reshape(-1,1))
One-Hot Encoding
|
|
Quiz: One-Hot Encoding
Having trouble? Here is a useful forum discussion about this quiz.
Here are some other links you may find helpful - LabelEncoder, OneHotEncoder
Time series data leakage
When dealing with time-series data, it can be tempting to simply disregard the timing structure and simply treat it as the appropriate form of categorical or numerical data.
One important concern, however, is that if you are building a predictive project looking at forecasting future data points. In this case, it is important NOT to use the future as a source of information! Since “hindsight is 20/20” and retrodictions are much easier than predictions, in predictive tasks it’s generally a good idea to use a training set made up of data from before a certain point, a validation set of data from some dates beyond that, and testing data leading up to the present. This way your algorithm won’t overfit by learning future trends.
A Hands-on Example
In the next section we’ll explore the famous Enron Email Dataset which was the focus of much of the Introduction to Machine Learning course.
While this specific dataset will play a less central role in this Nanodegree program, we will return to it a few times as an example to get practice with various techniques as they are introduced.
You can download our copy of the dataset here, along with the starting code for a variety of mini-projects. None of these mini-projects are required for completing the Nanodegree program, but they are great practice!
####
Datasets and Questions
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TdopVWltgqM.mp4
What Is A POI
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wDQhif-MWuY.mp4
EVALUATION AND VALIDATION
Training & Testing
Benefits of Testing
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7LGaeYfvRug.mp4
Features and Labels
As you continue your journey into designing ML algorithms using sklearn, you will come across two new terms namely, Features and Labels.
Features are individual measurable properties that you will be using to make predictions about your labels.
To understand this better, let’s use an example. Let’s say you are trying to design a model that will be able to predict whether you will like a particular kind of cuisine or not. For this case, the label is a Yes
for when the model thinks you will like said cuisine and No
for when it thinks otherwise. The features here could be things like Sweetness, Spicyness, Bitterness, Tangyness
and the like. One thing to note here is that when using our features we have to make sure that they are represented in a way that doesn’t skew one feature over another, in other words it’s usually a good idea to normalize or standardize your features; you will learn about these concepts in future lessons.
For now, as long as you understand the premise of what features and labels are and how they are used, you can proceed to the next node where Sebastian will explain this concept using a visual example.
Features and Labels Musical Example
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rnv0-lG9yKU.mp4
Evaluation Metrics
- Classification metrics
- Accuracy
- Confusion Matrix
- F1 Score
F1 = 2 * (precision * recall) / (precision + recall)
- Regression metrics
Welcome to Evaluation Metrics Lesson
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IHuFWRM9f9Q.mp4
Overview of Lesson
n this lesson we’ll look at a small selection of common performance metrics and evaluate some algorithms with them on the Titanic dataset you used earlier.
There are a few important things to keep in mind here:
- There is a big difference in performance based on whether a train/test split is used.
- In general, performance on all metrics is correlated. But some algorithms may end up doing better or worse in different situations.
- The practical coding of any metric looks almost exactly the same! The difficulty comes in how to make the choice, not in how to implement it.
The topics covered in this lesson are:
- Accuracy
- Precision
- Recall
- Confusion Matrix
- F1 score
- Mean Absolute Error
- Mean Squared Error
If you are familiar with these concepts you can skip ahead, but we do recommend completing this lesson as a refresher nonetheless.
MANAGING ERROR AND COMPLEXITY
Causes of Error
matplotlib.pyplot.plot
12plt.plot(training_sizes,training_scores,'go',label='training_scores')plt.plot(training_sizes,testing_scores,'rs',label='testing_scores')
Causes of Error
Now that we have covered some basic metrics for measuring model performance, let us turn our attention to reasons why models exhibit errors in the first place.
In model prediction there are two main sources of errors that a model can suffer from.
Bias
Bias due to a model being unable to represent the complexity of the underlying data. A high Bias model is said to underfit the data.
Variance
Variance due to a model being overly sensitive to the limited data it has been trained on. A high Variance model is said to overfit the data.
In the coming videos, we will go over each in detail.
Error due to Bias
Error due to Bias - Accuracy and Underfitting
Bias occurs when a model has enough data but is not complex enough to capture the underlying relationships. As a result, the model consistently and systematically misrepresents the data, leading to low accuracy in prediction. This is known as underfitting. Simply put, bias occurs when we have an inadequate model.
Example 1
An example might be when we have objects that are classified by color and shape, for example easter eggs, but our model can only partition and classify objects by color. It would therefore consistently mislabel future objects–for example labeling rainbows as easter eggs because they are colorful.
Example 2
Another example would be continuous data that is polynomial in nature, with a model that can only represent linear relationships. In this case it does not matter how much data we feed the model because it cannot represent the underlying relationship. To overcome error from bias, we need a more complex model.
Error due to Variance
Error due to Variance - Precision and Overfitting
When training a model, we typically use a limited number of samples from a larger population. If we repeatedly train a model with randomly selected subsets of data, we would expect its predictons to be different based on the specific examples given to it. Here variance is a measure of how much the predictions vary for any given test sample.
Some variance is normal, but too much variance indicates that the model is unable to generalize its predictions to the larger population. High sensitivity to the training set is also known as overfitting, and generally occurs when either the model is too complex or when we do not have enough data to support it.
We can typically reduce the variability of a model’s predictions and increase precision by training on more data. If more data is unavailable, we can also control variance by limiting our model’s complexity.
Learning Curve
Learning Curve
Now that you have understood the Bias and Variance concepts let us learn about ways we can identify when our model performs well. The Learning Curve functionality from sklearn can help us in this respect. It allows us to study the behavior of our model with respect to the number of data points being considered to understand if our model is performing well or not.
To start with , we have to import the module:
from sklearn.learning_curve import learning_curve # sklearn 0.17
from sklearn.model_selection import learning_curve # sklearn 0.18
From the documentation, a reasonable implementation of the function would be as follows:
learning_curve(
estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
Here, estimator
is the model which we are using to make our predictions with, for example it could be defined as GaussianNB()
, X
and y
are the features and label respectively, cv
is the cross validation generator, for example KFold()
, n_jobs
is the parameter that decides if we want to run multiple operations in parallel and train_sizes
is the number of training examples that will be considered to generate the curve.
In the following quiz, you will define your learning curve for a model that we have designed for you and you will observe the results.
Noisy Data, Complex Model
Here’s the code
Improving the Validity of a Model
There is a trade-off in the value of simplicity or complexity of a model given a fixed set of data. If it is too simple, our model cannot learn about the data and misrepresents the data. However if our model is too complex, we need more data to learn the underlying relationship. Otherwise it is very common for a model to infer relationships that might not actually exist in the data.
The key is to find the sweet spot that minimizes bias and variance by finding the right level of model complexity. Of course with more data any model can improve, and different models may be optimal.
To learn more about bias and variance, we recommend this essay by Scott Fortmann-Roe.
In addition to the subset of data chosen for training, what features you use from a given dataset can also greatly affect the bias and variance of your model.
Bias, Variance, and Number of Features
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/OurfO1ZR2GU.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mpYpT6nZVEo.mp4
Bias, Variance & Number of Features Pt 2
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1lNAvDubBfI.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/X_AS8NBngsk.mp4
Overfitting by Eye
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sJgPnuiHrs8.mp4
Representative Power of a Model
Introduction
Curse of Dimensionality
In this short lesson, we have Charles Isbell, Senior Associate Dean at Georgia Tech School of Computing and Michael Littman, former CS department chair at Rutgers University and current Professor at Brown University teach you about the curse of dimensionality.
These videos are from the OMSCS program at Georgia Tech.
Curse of Dimensionality
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/QZ0DtNFdDko.mp4
Curse of Dimensionality Two
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/OyPcbeiwps8.mp4
MODEL EVALUATION AND VALIDATION PROJECT
Predicting Boston Housing prices.
Overview
Project Overview
In this project, you will apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home. You will first explore the data to obtain important features and descriptive statistics about the dataset. Next, you will properly split the data into testing and training subsets, and determine a suitable performance metric for this problem. You will then analyze performance graphs for a learning algorithm with varying parameters and training set sizes. This will enable you to pick the optimal model that best generalizes for unseen data. Finally, you will test this optimal model on a new sample and compare the predicted selling price to your statistics.
Project Highlights
This project is designed to get you acquainted to working with datasets in Python and applying basic machine learning techniques using NumPy and Scikit-Learn. Before being expected to use many of the available algorithms in the sklearn library, it will be helpful to first practice analyzing and interpreting the performance of your model.
Things you will learn by completing this project:
- How to use NumPy to investigate the latent features of a dataset.
- How to analyze various learning performance plots for variance and bias.
- How to determine the best-guess model for predictions from unseen data.
- How to evaluate a model’s performance on unseen data using previous data.
Software Requirements
Description
The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you’ve come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for your clients’ homes.
Software and Libraries
This project uses the following software and Python libraries:
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.
Starting the Project
For this assignment, you can find the boston_housing
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains three files:
boston_housing.ipynb
: This is the main file where you will be performing your work on the project.housing.csv
: The project dataset. You’ll load this data in the notebook.visuals.py
: This Python script contains helper functions that create the necessary visualizations.
In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook boston_housing.ipynb
to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter notebook
or ipython notebook
and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Predicting Boston Housing Prices project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named boston_housing
for ease of access:
- The
boston_housing.ipynb
notebook file with all questions answered and all code cells executed and displaying output. - An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
Submission
Predicting Boston Housing Prices
The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you’ve come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for your clients’ homes.
Project Files
For this assignment, you can find the boston_housing
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
Evaluation
Your project will be reviewed by a Udacity reviewer against the Predicting Boston Housing Prices project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named boston_housing
for ease of access:
- The
boston_housing.ipynb
notebook file with all questions answered and all code cells executed and displaying output. - An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!
When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.
What’s Next?
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
PROJECT
- Predicting Boston Housing Prices project rubric
- open
jupyter
withWindows + r
typecd <path>``<path>
- Question 9
Do not forget to modify the parameter
cv_sets = ShuffleSplit(X.shape[0], n_iter = 10, test_size = 0.20, random_state = 0)
to fit 0.17 version
Project modification
Kaggle(stay tuned)
Career: Job Search Strategies
Opportunity can come when you least expect it, so when your dream job comes along, you want to be ready!
After completing these lessons, be sure to complete the Cover Letter Review project and 1 of the 3 Resume Review projects.
If you are a Nanodegree Plus student, Career Content and Career Development Projects are required for graduation.
If you are enrolled in a standard Nanodegree program, Career Content and Career Development Projects are optional and do not affect your graduation.
JOB SEARCH STRATEGIES
Cover Letter
Resume Review (Entry-level)
Resume Review (Career Change)
Resume Review (Prior Industry Experience)
Cover Letter Review
Conduct a Job Search
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/axcFtHK6If4.mp4
NVIDIA
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/C6Rt9lxMqHs.mp4
Job Search Mindset
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cBk7bno3KS0.mp4
Target Your Application to An Employer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/X9JBzbrkcvs.mp4
Open Yourself Up to Opportunity
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1OamTNkk1xM.mp4
Refine Your Resume
Convey Your Skills Concisely
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xnQr3ohml9s.mp4
Effective Resume Components
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AiFcaHRGdEA.mp4
Resume Structure
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/POM0MqLTj98.mp4
Describe Your Work Experiences
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B1LED4txinI.mp4
Description bullet points should convey:
- Action
- Numbers
- Success
Resume Reflection
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8Cj_tCp8mls.mp4
Resume Review
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/L3F2BFGYMtI.mp4
Types of Resume Formats
Resume formats can be split into three categories, depending on the job candidate:
- Entry-level (about 0-3 years of experience)
- Career Change (3+ years of work experience, looking to change career paths)
- Prior Industry Experience (3+ years of work experience; looking to level up in their career path by upskilling)
Build Your Resume
Resumes are required in job applications and recruiting. The most effective resumes are ones aimed at a specific job. In this project, you will find a job posting and target your resume to those job duties and requirements. Once you complete this project, you’ve successfully applied targeted job search strategies and are ready to look for work!
Receive a review of your resume that is tailored to your level of experience. The format and organization of your resume will vary dependent on your experience in the field. To ensure you’re highlighting your most relevant skills, you will submit your resume to 1 of 3 review projects that match your experience level best.
- Entry-level: For entry-level job applicants with 0-3 years experience in the field. Best suited for applicants who have recently graduated from their formal education and have limited work experience.
- Career Change: For those seeking a career change with 3+ years experience in an unrelated field. For example, if you’re a teacher looking for work as a data analyst, or even from project management to front-end development.
- Prior Industry Experience: For applicants with 3+ years of prior experience in a related field. This would include those with experience in software development looking for work in mobile development, or even from data science to machine learning.
Project Resources
- Project Rubrics: Your project will be reviewed by a Udacity Career Reviewer against these rubrics.
- Project Checklists: Based on the project rubric, this is a handy checklist to use during your resume building.
- Career Resource Center: Find additional tips and guides on developing your resume.
Resume Template Options
* Build your own! This will ensure your resume is unique.
* [Resume Genius: Resume Templates](https://resumegenius.com/resume-templates)
* [Resume Builder](https://www.livecareer.com/resume-builder)
Tips for Bullet Points
- Describe the following in your projects and experiences bullet points:
- Action
- Numbers
- Results
- UC Berkeley Action Verb List for Resumes & Cover Letters
Submit Your Resume for Review
Submission Instructions
- Find a job posting that you would apply to now or after your Nanodegree graduation. Judge if you would be a good fit for the role. (Note: If you’re more than 75% qualified for the job on paper, you’re probably a good candidate and should give applying a shot!)
- Refine your resume to target it to that job posting.
- Copy and paste, or link, the job posting in “Notes to reviewer” during submission.
- Optional: Remove any sensitive information, such as your phone number, from the submission.
- Submit your targeted resume as a .pdf to one of the following project submission pages dependent on your experience:
Share your Resume with Udacity Hiring Partners
Udacity partners with employers, who are able to contact Udacity students and alumni via your Professional Profile. Once you’ve completed the resume review project, make sure to upload your reviewed resume to your Profile!
####
Supervised Learning
Learn how Supervised Learning models such as Decision Trees, SVMs, Neural Networks, etc. are trained to model and predict labeled data.
Project: Finding Donors for CharityML
For most students, this project takes approximately 8 - 21 hours to complete (about 1 - 3 weeks).
17 LESSONS, 1 PROJECT
P2 Finding Donors for CharityML
SUPERVISED LEARNING TASKS
DECISION TREES
ID3(stay tuned)
ARTIFICIAL NEURAL NETWORKS
SUPPORT VECTOR MACHINES
NONPARAMETRIC MODELS
BAYESIAN METHODS
sklearn.naive_bayes.GaussianNB
sklearn.metrics.accuracy_score
- this Kaggle project
- Joint Distribution Analysis(stay tuned)
ENSEMBLE OF LEARNERS
- weak leanrner
- boosting-survey
INTRODUCTION TO SUPERVISED LEARNING
Supervised Learning Intro
Supervised Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Vc6KuGcfVPM.mp4
What You’ll Watch and Learn
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/zoN8QJYFka4.mp4
ML in The Google Self-Driving Car
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lL16AQItG1g.mp4
Supervised Learning What You’ll Do
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jgYqzo7UFsU.mp4
Acerous Vs. Non-Acerous
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TeFF9wXiFfs.mp4
Supervised Classification Example
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/buxApBhZCO0.mp4
Features and Labels Musical Example
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rnv0-lG9yKU.mp4
Features Visualization Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/t0iflCpBUDA.mp4
Classification By Eye
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xeMDpSRTLWc.mp4
Introduction to Regression
More Regressions
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_CznJ6phPsg.mp4
Parametric regression
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6EC1w_fs5u8.mp4
K nearest neighbor
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/CWCLQ6eu2Do.mp4
How to predict
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/go7ITLl79h8.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/r8PsDjf9scc.mp4
Kernel regression
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ZhJTGBbR18o.mp4
Parametric vs non parametric
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wKT8Ztzt6r0.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PVOWHYJV8P4.mp4
Which problems are regression?
Are Polynomials Linear?
Regressions in sklearn
Continuous Output Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/udJvijJvs1M.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/FOwEL4S-SVo.mp4
Continuous Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Bp6oBbLw8qE.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IC-fo_A0PxQ.mp4
DECISION TREES
Decision Trees
Difference between Classification and Regression
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/i04Pfrb71vk.mp4
More Decision Tree
Linearly Separable Data
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lCWGV6ZuXt0.mp4
Multiple Linear Questions
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/t1Y-nzgI1L4.mp4
ARTIFICIAL NEURAL NETWORKS
Neural Networks
Neural Networks
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/L_6idb3ZXB0.mp4
Neural Nets Mini-Project
Introduction
This section has a mix of coding assignments, multiple choice questions and fill in the blank type questions.
Please do check the instructor notes as we have included relevant forum posts that will help you work through these problems. You can find the instructor notes below the text/video nodes in the classroom.
Build a Perceptron
|
|
Here is relevant forum post for this quiz.
Note that here, and the rest of the mini-project, that signal strength equal to the threshold results in a 0 being output (rather than 1).
It is required that the dot product be strictly greater than the threshold, rather than greater than or equal to the threshold, to pass the assertion tests.
Threshold Meditation
Where to train Perceptrons
Perceptron Inputs
Neural Net Outputs
Perceptron Update Rule
|
|
This is relevant forum post for this quiz.
Layered Network Example
Linear Representational Power
Activation Function Quiz
Perceptron Vs Sigmoid
Sigmoid Learning
Gradient Descent Issues
SUPPORT VECTOR MACHINES
Math behind SVMs
Introduction
In this lesson, Charles and Mike will walk you through the Math behind Support Vector Machines. If you would like to jump straight to the higher level concepts and start coding it up using scikit-learn, you can head to the next lesson where Sebastian and Katie will walk you through everything you will need to get up and running with working SVM model.
The Best Line
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5yzSv4jYMyI.mp4
SVMs in Practice
Welcome to SVM
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gnAmmyQ_ZcQ.mp4
Separating Line
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mzKPXz-Yhwk.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/NTm_mA4akP4.mp4
NONPARAMETRIC MODELS
Instance Based Learning
Instance Based Learning Before
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ZTjot416e-g.mp4
BAYESIAN METHODS
Naive Bayes
Speed Scatterplot: Grade and Bumpiness
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IMWsjjIeOrY.mp4
Bayesian Learning
Intro
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PfJoHBLjkR8.mp4
Bayes Rule
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kv38EnIXSkY.mp4
Bayesian Inference
Intro
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AL9LH06uztM.mp4
Joint Distribution
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/RN7drTE2_oI.mp4
Bayes NLP Mini-Project
Bad Handwriting Exposition
Imagine your boss has left you a message from a location with terrible reception. Several words are impossible to hear. Based on some transcriptions of previous messages he’s left, you want to fill in the remaining words. To do this, we will use Bayes’ Rule to find the probability that a given word is in the blank, given some other information about the message.
Recall Bayes Rule:
P(A|B) = P(B|A)*P(A)/P(B)
Or in our case
P(a certain word|surrounding words) = P(surrounding words|a certain word)*P(a certain word) / P(surrounding words)
Calculations
Maximum Likelihood
Here’s the code
NLP Disclaimer
In the previous exercise, you may have thought of some ways we might want to clean up the text available to us.
For example, we would certainly want to remove punctuation, and generally want to make all strings lowercase for consistency. In most language processing tasks we will have a much larger corpus of data, and will want to remove certain features.
Overall, just keep in mind that this mini-project is about Bayesian probability. If you’re interested in the details of language processing, you might start with this Kaggle project, which introduces a more detailed and standard approach to text processing very different from what we cover here.
Optimal Classifier Example
Optimal Classifier Exercise
Here’s the code
Which Words Meditation
Joint Distribution Analysis
Domain Knowledge Quiz
Domain Knowledge Fill In
ENSEMBLE OF LEARNERS
Ensemble B&B
Ensemble Learning Boosting
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/w75WyRjRpAg.mp4
Back to Boosting
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PHBd2glewzM.mp4
Boosting Tends to Overfit
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/UHxYXwvjH5c.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Hp4gJjSFSYc.mp4
SUPERVISED LEARNING PROJECT
Finding donors for CharityML
Overview
Project Overview
In this project, you will apply supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause. You will first explore the data to learn how the census data is recorded. Next, you will apply a series of transformations and preprocessing techniques to manipulate the data into a workable format. You will then evaluate several supervised learners of your choice on the data, and consider which is best suited for the solution. Afterwards, you will optimize the model you’ve selected and present it as your solution to CharityML. Finally, you will explore the chosen model and its predictions under the hood, to see just how well it’s performing when considering the data it’s given.
Project Highlights
This project is designed to get you acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data. It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided.
Things you will learn by completing this project:
- How to identify when preprocessing is needed, and how to apply it.
- How to establish a benchmark for a solution to the problem.
- What each of several supervised learning algorithms accomplishes given a specific dataset.
- How to investigate whether a candidate solution model is adequate for the problem.
Software Requirements
Description
CharityML is a fictitious charity organization located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning. After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually. To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought you on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.
Software and Libraries
This project uses the following software and Python libraries:
Python 2.7
NumPy
pandas
scikit-learn (v0.17)
matplotlib
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.
Starting the Project
For this assignment, you can find the finding_donors
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects
we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains three files:
finding_donors.ipynb
: This is the main file where you will be performing your work on the project.census.csv
: The project dataset. You?ll load this data in the notebook.visuals.py
: This Python script provides supplementary visualizations for the project. Do not modify.
In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook finding_donors.ipynb
to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter notebook
or ipython notebook
and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Finding Donors for CharityML project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named finding_donors
for ease of access:
- The
finding_donors.ipynb
notebook file with all questions answered and all code cells executed and displaying output. - An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
Submission
Finding Donors for CharityML
CharityML is a fictitious charity organization located in the heart of Silicon Valley that was established to provide financial support for people eager to learn machine learning. After nearly 32,000 letters sent to people in the community, CharityML determined that every donation they received came from someone that was making more than $50,000 annually. To expand their potential donor base, CharityML has decided to send letters to residents of California, but to only those most likely to donate to the charity. With nearly 15 million working Californians, CharityML has brought you on board to help build an algorithm to best identify potential donors and reduce overhead cost of sending mail. Your goal will be evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield while also reducing the total number of letters being sent.
Project Files
For this assignment, you can find the finding_donors folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
Evaluation
Your project will be reviewed by a Udacity reviewer against the Finding Donors for CharityML project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named finding_donors for ease of access:
- The finding_donors.ipynb notebook file with all questions answered and all code cells executed and displaying output.
- An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!
When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.
What’s Next?
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
PROJECT
windows + r
cd + <path>
my path is G:\Udacity\MLND\machine-learning-master\projects\finding_donorsjupyter notebook finding_donors.ipynb
- Finding Donors for CharityML project rubric
- pandas.get_dummies()
- reviews
- submit
- second submit in review
- second review
- third review
Unsupervised Learning
Learn how to find patterns and structures in unlabeled data, perform feature transformations and improve the predictive performance of your models.
Project: Creating Customer Segments
For most students, this project takes approximately 10 - 15 hours to complete (about 1 - 2 weeks).
P3 Creating Customer Segments
CLUSTERING
- play with k-means clustering
- sklearn.cluster.KMeans
- Expectation Maximization
- The Enron dataset
- sklearn.preprocessing.MinMaxScaler
Introduction to Unsupervised Learning
Unsupervised Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8oZpT6Hekhk.mp4
What You’ll Watch and Learn
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1a68kAJAgIU.mp4
Clustering
Unsupervised Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Mx9f99bRB3Q.mp4
Clustering Movies
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/g8PKffm8IRY.mp4
More Clustering
Single Linkage Clustering
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HfikjFVM3dg.mp4
Quiz: Single Linkage Clustering
Please use a comma to separate the two objects that will be linked in your answer. For instance, to describe a link from a to b, write “a,b” as your answer in the box.
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/vytc9CsjjAs.mp4
Single Linkage Clustering Two
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/aojgUed9M0w.mp4
Clustering Mini-Project
Clustering Mini-Project Video
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/68EGMItJiNM.mp4
K-Means Clustering Mini-Project
In this project, we’ll apply k-means clustering to our Enron financial data. Our final goal, of course, is to identify persons of interest; since we have labeled data, this is not a question that particularly calls for an unsupervised approach like k-means clustering.
Nonetheless, you’ll get some hands-on practice with k-means in this project, and play around with feature scaling, which will give you a sneak preview of the next lesson’s material.
The Enron dataset can be found here.
Clustering Features
The starter code can be found in k_means/k_means_cluster.py, which reads in the email + financial (E+F) dataset and gets us ready for clustering. You’ll start with performing k-means based on just two financial features–take a look at the code, and determine which features the code uses for clustering.
Run the code, which will create a scatterplot of the data. Think a little bit about what clusters you would expect to arise if 2 clusters are created.
Deploying Clustering
Deploy k-means clustering on the financial_features data, with 2 clusters specified as a parameter. Store your cluster predictions to a list called pred, so that the Draw() command at the bottom of the script works properly. In the scatterplot that pops up, are the clusters what you expected?
FEATURE ENGINEERING
Feature Scaling
Chris’s T-Shirt Size (Intuition)
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oaqjLyiKOIA.mp4
A Metric for Chris
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/O0bvLU4l0is.mp4
Feature Selection
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/UAMwTr3cnok.mp4
Feature Selection
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8CpRLplmdqE.mp4
DIMENSIONALITY REDUCTION
- sklearn.decomposition.PCA
- This paper gives a fairly in-depth look at how the ICA algorithm works. It’s long, but comprehensive
- Cocktail Party Demo
PCA
Data Dimensionality
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gg7SAMMl4kM.mp4
Trickier Data Dimensionality
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-dcNhrSPmoY.mp4
PCA Mini-Project
PCA Mini-Project Intro
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rR68JXwKBxE.mp4
PCA Mini-Project
Our discussion of PCA spent a lot of time on theoretical issues, so in this mini-project we’ll ask you to play around with some sklearn code. The eigenfaces code is interesting and rich enough to serve as the testbed for this entire mini-project.
The starter code can be found in pca/eigenfaces.py. This was mostly taken from the example found here, on the sklearn documentation.
Take note when running the code, that there are changes in one of the parameters for the SVC function called on line 94 of pca/eigenfaces.py
. For the ‘class_weight’ parameter, the argument string “auto” is a valid value for sklearn version 0.16 and prior, but will be depreciated by 0.19. If you are running sklearn version 0.17 or later, the expected argument string should be “balanced”. If you get an error or warning when running pca/eigenfaces.py
, make sure that you have the correct argument on line 98 that matches your installed version of sklearn.
Feature Transformation
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/J9JsMNownYM.mp4
Feature Transformation
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B6mPphwAXZk.mp4
Summary
What we have learned
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/74oyGTdFp0Y.mp4
UNSUPERVISED LEARNING PROJECT
Things you will learn by completing this project:
- How to apply preprocessing techniques such as feature scaling and outlier detection.
- How to interpret data points that have been scaled, transformed, or reduced from PCA.
- How to analyze PCA dimensions and construct a new feature space.
- How to optimally cluster a set of data to find hidden patterns in a dataset.
- How to assess information given by cluster data and use it in a meaningful way.
jupyter notebook customer_segments.ipynb
Creating Customer Segments project rubric
submit project
review
second review
git project:
- open
git bash
and tpyecd <path>
with\
before space key and/
between directory git init
git status
git add <>
git commmit -m "description"
- create a repo in github
git remote add origin URL
git push origin master
git push origin master
git add <>``git commit -m "second submit"``git push origin master
Identifying customers by clustering them.
Overview
Project Overview
In this project you will apply unsupervised learning techniques on product spending data collected for customers of a wholesale distributor in Lisbon, Portugal to identify customer segments hidden in the data. You will first explore the data by selecting a small subset to sample and determine if any product categories highly correlate with one another. Afterwards, you will preprocess the data by scaling each product category and then identifying (and removing) unwanted outliers. With the good, clean customer spending data, you will apply PCA transformations to the data and implement clustering algorithms to segment the transformed customer data. Finally, you will compare the segmentation found with an additional labeling and consider ways this information could assist the wholesale distributor with future service changes.
Project Highlights
This project is designed to give you a hands-on experience with unsupervised learning and work towards developing conclusions for a potential client on a real-world dataset. Many companies today collect vast amounts of data on customers and clientele, and have a strong desire to understand the meaningful relationships hidden in their customer base. Being equipped with this information can assist a company engineer future products and services that best satisfy the demands or needs of their customers.
Things you will learn by completing this project:
- How to apply preprocessing techniques such as feature scaling and outlier detection.
- How to interpret data points that have been scaled, transformed, or reduced from PCA.
- How to analyze PCA dimensions and construct a new feature space.
- How to optimally cluster a set of data to find hidden patterns in a dataset.
- How to assess information given by cluster data and use it in a meaningful way.
Software Requirements
Description
A wholesale distributor recently tested a change to their delivery method for some customers, by moving from a morning delivery service five days a week to a cheaper evening delivery service three days a week. Initial testing did not discover any significant unsatisfactory results, so they implemented the cheaper option for all customers. Almost immediately, the distributor began getting complaints about the delivery service change and customers were canceling deliveries — losing the distributor more money than what was being saved. You’ve been hired by the wholesale distributor to find what types of customers they have to help them make better, more informed business decisions in the future. Your task is to use unsupervised learning techniques to see if any similarities exist between customers, and how to best segment customers into distinct categories.
Software and Libraries
This project uses the following software and Python libraries:
Python 2.7
NumPy
pandas
scikit-learn (v0.17)
matplotlib
You will also need to have software installed to run and execute a Jupyter Notebook.
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.
Starting the Project
For this assignment, you can find the customer_segments
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects
we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains three files:
customer_segments.ipynb
: This is the main file where you will be performing your work on the project.customers.csv
: The project dataset. You’ll load this data in the notebook.visuals.py
: This Python script provides supplementary visualizations for the project. Do not modify.
In the Terminal or Command Prompt, navigate to the folder containing the project files, and then use the command jupyter notebook customer_segments.ipynb
to open up a browser window or tab to work with your notebook. Alternatively, you can use the command jupyter noteboo
k or ipython notebook
and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the project. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Creating Customer Segments project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named customer_segments
for ease of access:
- The
customer_segments.ipynb
notebook file with all questions answered and all code cells executed and displaying output. - An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
Submission
Creating Customer Segments
A wholesale distributor recently tested a change to their delivery method for some customers, by moving from a morning delivery service five days a week to a cheaper evening delivery service three days a week.Initial testing did not discover any significant unsatisfactory results, so they implemented the cheaper option for all customers. Almost immediately, the distributor began getting complaints about the delivery service change and customers were canceling deliveries — losing the distributor more money than what was being saved. You’ve been hired by the wholesale distributor to find what types of customers they have to help them make better, more informed business decisions in the future. Your task is to use unsupervised learning techniques to see if any similarities exist between customers, and how to best segment customers into distinct categories.
Project Files
For this assignment, you can find the customer_segments folder containing the necessary project files on the Machine Learning projects GitHub, under the projects folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
Evaluation
Your project will be reviewed by a Udacity reviewer against the Creating Customer Segments project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named customer_segments for ease of access:
- The customer_segments.ipynb notebook file with all questions answered and all code cells executed and displaying output.
- An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!
When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.
What’s Next?
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
Supporting Materials
Videos Zip File
Career: Networking
In the following lesson, you will learn how tell your unique story to recruiters in a succinct and professional but relatable way.
After completing these lessons, be sure to complete the online profile review projects, such as LinkedIn Profile Review.
If you are a Nanodegree Plus student, Career Content and Career Development Projects are required for graduation.
If you are enrolled in a standard Nanodegree program, Career Content and Career Development Projects are optional and do not affect your graduation.
NETWORKING
Develop Your Personal Brand
Why Network?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/exjEm9Paszk.mp4
Elevator Pitch
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/S-nAHPrkQrQ.mp4
Personal Branding
How to Stand Out
Imagine you’re a hiring manager for a company, and you need to pick 5 people to interview for a role. But you get 50 applications, and everyone seems pretty qualified. How do you compare job candidates? You’ll probably pick the candidates that stand out the most to you.
Personal Stories
The thing that always makes a job candidate unique is their personal story - their passion and how they got there. Employers aren’t just looking for someone with the skills, but they’re looking for someone who can drive the company’s mission and will be a part of innovation. That’s why they need to know your work ethic and what drives you.
As someone wanting to impress an employer, you need to tell your personal story. You want employers to know how you solve problems, overcome challenges, achieve results. You want employers to know what excites you, what motivates you, what drives you forward.
All of this can be achieved through effective storytelling, and effective branding.
I’ll let you know I’ve branded and rebranded myself many times. That’s okay - people are complex and have multiple interests that change over time.
In this next video, we’ll meet my coworker Chris who will show us how he used personal branding to help him in his recent career change.
Resources
Blog post: Storytelling, Personal Branding, and Getting Hired
Meet Chris
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0ccflD9x5WU.mp4
Resources
Blog post: Overcome Imposter Syndrome
Elevator Pitch
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0QtgTG49E9I.mp4
Pitching to a Recruiter
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/LxAdWaA-qTQ.mp4
Use Your Elevator Pitch
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/e-v60ieggSs.mp4
Optimize Your LinkedIn Profile
Why LinkedIn
LinkedIn is the most popular professional networking platform out there, so most recruiters use it to find job seekers. It’s so common for hiring teams to use LinkedIn to find and look at candidates, that it’s almost a red flag if they’re unable to find a LinkedIn profile for you.
It’s also a great platform for you to connect with other people in your field. Udacity for example has an Alumni LinkedIn group where graduates can collaborate on projects, practice job interviews, or discuss new trends in the industry together. Connecting with a fellow alum and asking for a referral would increase your chances of getting an interview.
Find Connections
The best way to use your LinkedIn effectively, however, is to have over 500 connections.
This may seem like a lot, but once you get rolling, you’ll get to that number fast. After you actively start using it it, by joining groups and going to networking events, your number of connections will climb. You are more likely to show up in search results on LinkedIn if you have more connections, which means you’ll be more visible to recruiters.
Join Groups
Increasing the group of people you’re connected with also exposes you to what they’re working on or have done. For example, if you move to a new city, you can search your network to see who lives in the area, and ask for recommendations on apartment hunting, job leads, or other advice on adjusting to life in another city.
Also, if you’re active in a LinkedIn group or if you frequently write LinkedIn blog posts, you’ll increase your visibility on the platform and likelihood that a recruiter will find your profile.
How to Build Your LinkedIn Profile
LinkedIn guides you well when filling out your profile. It tells you if your profile is strong and offers recommendations on how to improve it. We recommend you follow LinkedIn’s advice because it’ll increase your visibility on the network, thus increasing the number of opportunities you may come across.
Tips for an Awesome LinkedIn Profile
In the lessons on conducting a successful job search and resume writing, we talk about how you can describe your work experiences in a way that targets a specific job.
Use what you learn to describe your experiences in LinkedIn’s projects and work sections. You can even copy and paste over the bullet points in your resume to the work or project sections of LinkedIn. Making sure your resume and LinkedIn are consistent helps build your personal brand.
Find Other Networking Platforms
Remember that LinkedIn isn’t the only professional networking platform out there. If you do have a great LinkedIn profile, that means you can also build an amazing profile on other platforms. Find some recommendations for online profiles on the Career Resource Center.
Up Next
By now, you know how to target your job profile to your dream job. You know how to market yourself effectively through building off your elevator pitch. Being confident in this will help you network naturally, whether on LinkedIn or at an event in-person.
Move on to the LinkedIn Profile Review and get personalized feedback on your online presence.
Networking Your Way to a New Job
Career and Job Fairs Do’s and Don’ts
What are career mixers?
GitHub Profile Review
- Project Rubric. Your project will be reviewed by a Udacity Career Reviewer against this rubric.
- Project checklist. Based on the project rubric, this is a handy checklist on GitHub best practices.
- Career Resource Center. Find additional tips and guides on developing your GitHub Profile.
- Rubrics
- submit
- reviews
- Third reviews
LinkedIn Profile Review
Udacity Professional Profile Review
Reinforcement Learning
DUE OCT 19
Use Reinforcement Learning algorithms like Q-Learning to train artificial agents to take optimal actions in an environment.
Project: Train a Smartcab to Drive
For most students, this project takes approximately 15 - 21 hours to complete (about 2 - 3 weeks).
P4 Train a Smartcab to Drive
Markov Decision Processes
- Further details on this quiz can be found in Chapter 17 of Artificial Intelligence: A Modern Approach
REINFORCEMENT LEARNING
- Andrew Moore’s slides on Zero-Sum Games
- Andrew Moore’s slides on Non-Zero-Sum Games
- This paper offers a summary and an investigation of the field of reinforcement learning. It’s long, but chock-full of information!
PROJECT
Software Requirements
pygame
Common Problems with PyGame
- Getting Started
- PyGame Information
- Google Group
- PyGame subreddit
- use the discussion forums
- MLND Student Slack Community
Train a Smartcab to Drive project rubric
submit
windows + r
typepip install pygame
review
REINFORCEMENT LEARNING
Introduction to Reinforcement Learning
Reinforcement Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PeAHckcWFS0.mp4
What You’ll Watch and Learn
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Z6ATPu4b9nc.mp4
Reinforcement Learning What You’ll Do
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1vQQphPLnkM.mp4
Markov Decision processes
Introduction
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_ocNerSvh5Y.mp4
Reinforcement Learning
Reinforcement Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HeYSFWPX_4k.mp4
Rat Dinosaurs
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/h7ExhVneBDU.mp4
GAME THEORY
Game Theory
Game Theory
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/vYHk1SPpnmQ.mp4
What Is Game Theory?
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jwlteKFyiHU.mp4
PROJECT
Train a cab to drive itself.
Overview
Software Requirements
Description
In the not-so-distant future, taxicab companies across the United States no longer employ human drivers to operate their fleet of vehicles. Instead, the taxicabs are operated by self-driving agents, known as smartcabs, to transport people from one location to another within the cities those companies operate. In major metropolitan areas, such as Chicago, New York City, and San Francisco, an increasing number of people have come to depend on smartcabs to get to where they need to go as safely and reliably as possible. Although smartcabs have become the transport of choice, concerns have arose that a self-driving agent might not be as safe or reliable as human drivers, particularly when considering city traffic lights and other vehicles. To alleviate these concerns, your task as an employee for a national taxicab company is to use reinforcement learning techniques to construct a demonstration of a smartcab operating in real-time to prove that both safety and reliability can be achieved.
Software Requirements
This project uses the following software and Python libraries:
- Python 2.7
- NumPy
- pandas
- matplotlib
- PyGame
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.pygame
can then be installed using one of the following commands:
Mac: conda install -c https://conda.anaconda.org/quasiben pygame
Linux: conda install -c https://conda.anaconda.org/tlatorre pygame
Windows: conda install -c https://conda.anaconda.org/prkrekel pygame
Please note that installing pygame can be done using pip as well.
You can run an example to make sure pygame is working before actually performing the project by running:
python -m pygame.examples.aliens
Common Problems with PyGame
Fixing Common PyGame Problems
The PyGame library can in some cases require a bit of troubleshooting to work correctly for this project. While the PyGame aspect of the project is not required for a successful submission (you can complete the project without a visual simulation, although it is more difficult), it is very helpful to have it working! If you encounter an issue with PyGame, first see these helpful links below that are developed by communities of users working with the library:
Problems most often reported by students
“PyGame won’t install on my machine; there was an issue with the installation.”
Solution: As has been recommended for previous projects, Udacity suggests that you are using the Anaconda distribution of Python, which can then allow you to install PyGame through the conda
-specific command.
“I’m seeing a black screen when running the code; output says that it can’t load car images.”
Solution: The code will not operate correctly unless it is run from the top-level directory for smartcab. The top-level directory is the one that contains the README and the project notebook.
If you continue to have problems with the project code in regards to PyGame, you can also use the discussion forums to find posts from students that encountered issues that you may be experiencing. Additionally, you can seek help from a swath of students in the MLND Student Slack Community.
Starting the Project
For this assignment, you can find the smartcab
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains three directories:
/logs/
: This folder will contain all log files that are given from the simulation when specific prerequisites are met./images/
: This folder contains various images of cars to be used in the graphical user interface. You will not need to modify or create any files in this directory./smartcab/
: This folder contains the Python scripts that create the environment, graphical user interface, the simulation, and the agents. You will not need to modify or create any files in this directory except foragent.py
.
It also contains two files:
smartcab.ipynb
: This is the main file where you will answer questions and provide an analysis for your work. -visuals.py
: This Python script provides supplementary visualizations for the analysis. Do not modify.
Finally, in/smartcab/
are the following four files:Modify:
agent.py
: This is the main Python file where you will be performing your work on the project.
- Do not modify:
environment.py
: This Python file will create the smartcab environment.planner.py
: This Python file creates a high-level planner for the agent to follow towards a set goal.simulator.py
: This Python file creates the simulation and graphical user interface.
Running the Code
In a terminal or command window, navigate to the top-level project directory smartcab/
(that contains the three project directories) and run one of the following commands:
python smartcab/agent.py
orpython -m smartcab.agent
This will run the agent.py
file and execute your implemented agent code into the environment. Additionally, use the command jupyter notebook smartcab.ipynb
from this same directory to open up a browser window or tab to work with your analysis notebook. Alternatively, you can use the command jupyter notebook
or ipython notebook
and navigate to the notebook file in the browser window that opens. Follow the instructions in the notebook and answer each question presented to successfully complete the implementation necessary for your agent.py
agent file. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project.
Definitions
Environment
The smartcab operates in an ideal, grid-like city (similar to New York City), with roads going in the North-South and East-West directions. Other vehicles will certainly be present on the road, but there will be no pedestrians to be concerned with. At each intersection there is a traffic light that either allows traffic in the North-South direction or the East-West direction. U.S. Right-of-Way rules apply:
- On a green light, a left turn is permitted if there is no oncoming traffic making a right turn or coming straight through the intersection.
- On a red light, a right turn is permitted if no oncoming traffic is approaching from your left through the intersection. To understand how to correctly yield to oncoming traffic when turning left, you may refer to this official drivers’ education video, or this passionate exposition.
Inputs and Outputs
Assume that the smartcab is assigned a route plan based on the passengers’ starting location and destination. The route is split at each intersection into waypoints, and you may assume that the smartcab, at any instant, is at some intersection in the world. Therefore, the next waypoint to the destination, assuming the destination has not already been reached, is one intersection away in one direction (North, South, East, or West). The smartcab has only an egocentric view of the intersection it is at: It can determine the state of the traffic light for its direction of movement, and whether there is a vehicle at the intersection for each of the oncoming directions. For each action, the smartcab may either idle at the intersection, or drive to the next intersection to the left, right, or ahead of it. Finally, each trip has a time to reach the destination which decreases for each action taken (the passengers want to get there quickly). If the allotted time becomes zero before reaching the destination, the trip has failed.
Rewards and Goal
The smartcab will receive positive or negative rewards based on the action it as taken. Expectedly, the smartcab will receive a small positive reward when making a good action, and a varying amount of negative reward dependent on the severity of the traffic violation it would have committed. Based on the rewards and penalties the smartcab receives, the self-driving agent implementation should learn an optimal policy for driving on the city roads while obeying traffic rules, avoiding accidents, and reaching passengers’ destinations in the allotted time.
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Train a Smartcab to Drive project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named smartcab
for ease of access:
- The
agent.py
Python file with all code implemented as required in the instructed tasks. - The
/logs/
folder which should contain five log files that were produced from your simulation and used in the analysis. - The
smartcab.ipynb
notebook file with all questions answered and all visualization cells executed and displaying results.- An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
Submission
Train a Smartcab to Drive
In the not-so-distant future, taxicab companies across the United States no longer employ human drivers to operate their fleet of vehicles. Instead, the taxicabs are operated by self-driving agents — known as smartcabs — to transport people from one location to another within the cities those companies operate. In major metropolitan areas, such as Chicago, New York City, and San Francisco, an increasing number of people have come to rely on smartcabs to get to where they need to go as safely and efficiently as possible. Although smartcabs have become the transport of choice, concerns have arose that a self-driving agent might not be as safe or efficient as human drivers, particularly when considering city traffic lights and other vehicles. To alleviate these concerns, your task as an employee for a national taxicab company is to use reinforcement learning techniques to construct a demonstration of a smartcab operating in real-time to prove that both safety and efficiency can be achieved.
Project Files
For this assignment, you can find the smartcab
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
Evaluation
Your project will be reviewed by a Udacity reviewer against the Train a Smartcab to Drive project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named smartcab
for ease of access:
- The
agent.py
Python file with all code implemented as required in the instructed tasks. - The
/logs/
folder which should contain five log files that were produced from your simulation and used in the analysis. - The
smartcab.ipynb
notebook file with all questions answered and all visualization cells executed and displaying results.- An HTML export of the project notebook with the name report.html. This file must be present for your project to be evaluated.
I’m Ready!
When you’re ready to submit your project, click on the Submit Project button at the bottom of this page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.
What’s Next?
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
Supporting Materials
View Submission
Deep Learning
P5 Build a Digit Recognition Program
FROM MACHINE LEARNING TO DEEP LEARNING
SOFTWARE AND TOOLS
Download and Setup
Method 1: Pre-built Docker container with TensorFlow and all assignments
To get started with TensorFlow quickly and work on your assignments, follow the instructions in this README.
Note: If you are on a Windows machine, Method 1 is your only option due to lack of native TensorFlow support.
(not needed) Check your GPU
right click computer
->property
->设备管理器
->显示适配器
I use the CPU only method
(failed) First try from discussion at Udacity
- Install Docker Toolbox (you can get it here). I recommend installing every optional package. ->failed
- Create a virtual machine for your udacity tensorflow work:
docker-machine create -d virtualbox --virtualbox-memory 2048 tensorflow
- In a cmd.exe prompt, run
FOR /f "tokens=*" %i IN ('docker-machine env --shell cmd tensorflow') DO %i
- Next, run
docker run -p 8888:8888 --name tensorflow-udacity -it b.gcr.io/tensorflow-udacity/assignments:0.5.0
- In a browser, go to
http://192.168.99.100:8888/tree
(failed) Second try
I have 2 versions in python, so I will not use this one.
- Click here and follow the instructions.
- Download Python 3.5.3 and choose Windows x86-64 executable installer -> install Python3.5.x and add path.
windows + r
->pip3 install --upgrade tensorflow
(failed) Third try from discussion at Udacity
windows + r
ohe = preprocessing.OneHotEncoder() # creating OneHotEncoder object label_encoded_data = label_encoder.fit_transform(data['health']) ohe.fit_transform(label_encoded_data.reshape(-1,1))
After executing the above steps, I can use tensorflow by selecting the following option in Jupyter notebook: Kernel => Change kernel => python [conda env:py35]
Note: I used python 2.7 and jupyter notebook for the earlier assignments.
(Useful) Forth method
Follow this video and install Ubuntu in Virtualbox.
虚拟硬盘文件保存位置C:\Users\SSQ\VirtualBox VMs\Deep Learning Ubuntu\Deep Learning Ubuntu.vdi
location of shared file C:\Users\SSQ\virtualbox share
Follow this blog to copy files between host OS and guest OS.
for me I usesudo mount -t vboxsf virtualbox_share /mnt/
Follow this TensorFlow
for mac ox, follow this video
register mega
https://www.tensorflow.org/get_started/os_setup#pip_installation_on_windows
(success) Fifth try with pip install
Follow this website
When I type pip install tensorflow
in Virtualbox (OS:Linux),
it always shows ReadTimeoutError: HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out.
,
so I choosesudo pip install --upgrade https://pypi.tuna.tsinghua.edu.cn/packages/7b/c5/a97ed48fcc878e36bb05a3ea700c077360853c0994473a8f6b0ab4c2ddd2/tensorflow-1.0.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=a7483a4da4d70cc628e9e207238f77c0
to install tensorflow
Collecting numpy>=1.11.0 (from tensorflow==1.0.0)
Downloading numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl (16.5MB)
sudo pip install --upgrade https://pypi.python.org/packages/cb/47/19e96945ee6012459e85f87728633f05b1e8791677ae64370d16ac4c849e/numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=9f9bc53d2e281831e1a75be0c09a9548
From this mirror
sudo pip install --upgrade https://mirrors.ustc.edu.cn/pypi/web/packages/cb/47/19e96945ee6012459e85f87728633f05b1e8791677ae64370d16ac4c849e/numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=9f9bc53d2e281831e1a75be0c09a9548
Try again successpip install --index https://pypi.mirrors.ustc.edu.cn/simple/ tensorflow
Validate your installation
$ python
import tensorflow as tf
hello = tf.constant(‘Hello, TensorFlow!’)
sess = tf.Session()
print(sess.run(hello))
Hello, TensorFlow!
(success) Sixth try with anaconda inatall
Follow this website
Download anaconda in the VirtualBox
for me it shows readtimeouterror
So I decide to download it in my host OS and copy it to my share file C:\Users\SSQ\virtualbox share
and I can find it in the /mnt
from my Linux system.
type bash /mnt/Anaconda2-4.3.0-Linux-x86_64.sh
type yes
Anaconda2 will now be installed into this location:/home/ssq/anaconda2
Press ENTER to confirm the location
Press CTRL-C to abort the installation
Or specify a different location below
click Enter
Do you wish the installer to prepend the Anaconda2 install location
to PATH in your /home/ssq/.bashrc ? [yes|no]
yes
Open new terminal and type conda create -n tensorflow
Fetching package metadata …
CondaHTTPError: HTTP None None for url
Elapsed: NoneAn HTTP error occurred when trying to retrieve this URL.
ConnectionError(ReadTimeoutError(“HTTPSConnectionPool(host=’repo.continuum.io’, port=443): Read timed out.”,),)
Try again conda create -n tensorflow
source activate tensorflow
From ssq@ssq-VirtualBox:~$
to (tensorflow) ssq@ssq-VirtualBox:~$
Successy
pip install --index https://pypi.mirrors.ustc.edu.cn/simple/ tensorflow
Validate your installation
$ python
import tensorflow as tf
hello = tf.constant(‘Hello, TensorFlow!’)
sess = tf.Session()
print(sess.run(hello))
Hello, TensorFlow!
source deactivate tensorflow
From (tensorflow) ssq@ssq-VirtualBox:~$
to ssq@ssq-VirtualBox:~$
(failed)docker install
Install docker with sudo apt install docker.io
**ubuntu 16.06 GPU tensorflow install
mkdir
– make directory in linux serverrm
– remove directory in linux serverwget
– download filessh
– install.sh
file- install anaconda
reference to this article - install cudnn
- install CUDA
- install tensorflow-gpu directly within your
, reference to this page
Assignments
Assignments
Note: If you installed TensorFlow using the pre-built Docker container, you do not have to fetch assignment code separately. Just run the container and access the notebooks as mentioned here.
Get Starter Code
Starter code packages (Jupyter notebooks) are available from the main TensorFlow repository. Clone it and navigate to the tensorflow/examples/udacity/ directory
.
This contains all the Jupyter notebooks (.ipynb
files) as well as a Docker spec (Dockerfile
).
Run
Depending on how you installed TensorFlow, do one of the following to run assignment code:
**Pip/virtualenv**: Run `ipython notebook` and open http://localhost:8888 in a browser.
**Docker**: As mentioned in README.md:
First build a local Docker container: docker build -t $USER/assignments .
Run the container: docker run -p 8888:8888 -it --rm $USER/assignments
Now find your VM's IP using docker-machine ip default (say, 192.168.99.100) and open http://192.168.99.100:8888
You should be able to see a list of notebooks, one for each assignment. Click on the appropriate one to open it, and follow the inline instructions.
And you’re ready to start exploring! To get further help on each assignment, navigate to the appropriate node.
If you want to learn more about iPython (or Jupyter) notebooks, visit jupyter.org.
Assignment 1: notMNIST
Assignment 1: notMNIST
Preprocess notMNIST data and train a simple logistic regression model on it
notMNIST dataset samples
Starter Code
Open the iPython notebook for this assignment (1_notmnist.ipynb), and follow the instructions to implement and run each indicated step. Some of the early steps that preprocess the data have been implemented for you.
Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer the posed questions (save your responses as markdown in the notebook).
In the end, you should have a model trained on the notMNIST dataset, which is able to recognize a subset of English letters in different fonts. How accurately does your model predict the correct labels on the test dataset?
Problem 2: Verify normalized images
Note how imshow() displays an image using a color map. You can change this using the cmap parameter. Check out more options in the API reference.
DEEP NEURAL NETWORKS
Deep Neural Networks
Assignment 2: SGD
Assignment 2: Stochastic Gradient Descent
Train a fully-connected network using Gradient Descent and Stochastic Gradient Descent
Note: The assignments in this course build on each other, so please finish Assignment 1 before attempting this.
Starter Code
Open the iPython notebook for this assignment (2_fullyconnected.ipynb), and follow the instructions to implement and/or run each indicated step. Some steps have been implemented for you.
Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).
Your new model should perform better than the one you developed for Assignment 1. Also, the time required to train using Stochastic Gradient Descent (SGD) should be considerably less than simple Gradient Descent (GD).
Errors
Error:
valueError: Only call
softmax_cross_entropy_with_logits
with named arguments (labels=…, logits=…, …)
Fix:
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))Assignment 3: Regularization
Assignment 3: Regularization
Use regularization techniques to improve a deep learning model
Note: The assignments in this course build on each other, so please finish them in order.
Starter Code
Open the iPython notebook for this assignment (3_regularization.ipynb), and follow the instructions to implement and run each indicated step. Some steps have been implemented for you.
Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).
Try to apply the different regularization techniques you have learnt, and compare their results. Which seems to work better? Is one clearly better than the others?
Error in VirtualBox
Error:
How to fix:
Close all the process in the host OS and free up memory.
Restart VM
CONVOLUTIONAL NEURAL NETWORKS
Readings
Readings
For a closer look at the arithmetic behind convolution, and how it is affected by your choice of padding scheme, stride and other parameters, please refer to this illustrated guide:
V. Dumoulin and F. Visin, A guide to convolution arithmetic for deep learning.
Assignment 4: Convolutional Models
Design and train a Convolutional Neural Network
Note: The assignments in this course build on each other, so please finish them in order.
Starter Code
Open the iPython notebook for this assignment (4_convolutions.ipynb), and follow the instructions to implement and run each indicated step. Some steps have been implemented for you.
Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).
Improve the model by experimenting with its structure - how many layers, how they are connected, stride, pooling, etc. For more efficient training, try applying techniques such as dropout and learning rate decay. What does your final architecture look like?
DEEP MODELS FOR TEXT AND SEQUENCES
tSNE
Assignment 5: Word2Vec and CBOW
Assignment 5: Word2Vec and CBOW
Train a skip-gram model on Text8 data and visualize the output
Note: The assignments in this course build on each other, so please finish them in order.
Starter Code
Open the iPython notebook for this assignment (5_word2vec.ipynb), and follow the instructions to implement and run each indicated step. The first model (Word2Vec) has been implemented for you. Using that as a reference, train a CBOW (Continuous Bag of Words) model.
Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).
How does your CBOW model perform compared to the given Word2Vec model?
Open
sudo mount -t vboxsf virtualbox_share /mnt/
jupyter notebook
run
TypeError:Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.
|
|
Method:tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, train_labels, embed,num_sampled, vocabulary_size))
Reference:
https://github.com/nlintz/TensorFlow-Tutorials/issues/80
Assignment 6: LSTMs
Assignment 6: LSTMs
Train a Long Short-Term Memory network to predict character sequences
Note: The assignments in this course build on each other, so please finish them in order.
Starter Code
Open the iPython notebook for this assignment (6_lstm.ipynb), and follow the instructions to implement and run each indicated step. A basic LSTM model has been provided; improve it by solving the given problems.
Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).
What changes did you make to use bigrams as input instead of individual characters? Were you able to implement the sequence-to-sequence LSTM? If so, what additional challenges did you have to solve?
Run
AttributeError:'module' object has no attribute 'concat_v2'
|
|
|
|
ValueError:
Only call softmax_cross_entropy_with_logits
with named arguments (labels=…, logits=…, …)
|
|
Method
|
|
|
|
|
|
Have a try
pip uninstall tensorflow
pip install --ignore-installed --upgrade https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb
sudo pip install --index https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb
sudo pip install --upgrade https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb
sudo pip install --index https://mirrors.ustc.edu.cn/pypi/web/packages/01/c5/adefd2d5c83e6d8b4a8efa5dd00e44dc05de317b744fb58aef6d8366ce2b/tensorflow-0.12.0-cp27-cp27mu-manylinux1_x86_64.whl#md5=ebcd1b32ccf2279bfa688542cbdad5fb
PROJECT
(new) Deep Learning
MACHINE LEARNING TO DEEP LEARNING
Deep Learning
Deep Learning
Up to this point you’ve been introduced to a number of different learning schemes that take place in machine learning. You’ve seen supervised learning, where we try to extrapolate labels for new data given labelled data we already have. You’ve seen unsupervised learning, where we try to classify data into groups and extract new information hidden in the data. Lastly, you’ve seen reinforcement learning, where we try to create a model that learns the rules of an environment to best maximize its return or reward.
In this lesson, you’ll learn about a relatively new branch of machine learning called deep learning, which attempts to model high-level abstractions about data using networks of graphs. Deep learning, much like the other branches of machine learning you’ve seen, is similarly focused on learning representations in data. Additionally, modeling high-level abstractions about data is very similar to artificial intelligence — the idea that knowledge can be represented and acted upon intelligently.
What You’ll Watch and Learn
For this lesson, you’ll want to learn about algorithms that help you to construct the deep network graphs necessary to model high-level abstractions about data. In addition, you’ll also want to learn how to construct deep models that can interpret and identify words and letters in text — just like how a human reads! To do that, you’ll work on Udacity’s Deep Learning course, co-authored by Google. Vincent Vanhoucke, Principle Scientist at Google Brain, will be your instructor for this lesson. With Vincent as your guide, you’ll learn the ins and outs of Deep Learning and TensorFlow, which is Google’s Deep Learning framework.
Deep Learning What You’ll Do
In this lesson, you’ll learn how you can develop algorithms that are suitable to model high-level abstractions of data and create a type of “intelligence” that is able to use this abstraction for processing new information. First, you’ll learn about deep neural networks — artificial neural networks that have multiple hidden layers of information between its input and output. Next, you’ll learn about convolutional neural networks — a different flavor of neural networks that are modeled after biological processes like visual and aural feedback. Finally, you’ll learn about deep models for sequence learning — models that can “understand” written and spoken language and text.
The underlying lesson from these concepts is that, with enough data and time to learn, we can develop intelligent agents that think and act in many of the same ways we as humans do. Being able to model complex human behaviors and tasks like driving a car, processing spoken language, or even building a winning strategy for the game of Go, is a task that could not be done without use of deep learning.
Software and Tools
TensorFlow
TensorFlow
We will be using TensorFlow™, an open-source library developed by Google, to build deep learning models throughout the course. Coding will be in Python 2.7 using iPython notebooks, which you should be familiar with.
Download and Setup
Method 1: Pre-built Docker container with TensorFlow and all assignments
To get started with TensorFlow quickly and work on your assignments, follow the instructions in this README.
Note: If you are on a Windows machine, Method 1 is your only option due to lack of native TensorFlow support.
– OR –
Method 2: Install TensorFlow on your computer (Linux or Mac OS X only), then fetch assignment code separately
Follow the instructions to download and setup TensorFlow. Choose one of the three ways to install:
Pip: Install TensorFlow directly on your computer. You need to have Python 2.7 and pip installed; and this may impact other Python packages that you may have.
Virtualenv: Install TensorFlow in an isolated (virtual) Python environment. You need to have Python 2.7 and virtualenv installed; this will not affect Python packages in any other environment.
Docker: Run TensorFlow in an isolated Docker container (virtual machine) on your computer. You need to have Vagrant, Docker and virtualization software like VirtualBox installed; this will keep TensorFlow completely isolated from the rest of your computer, but may require more memory to run.
Links: Tutorials, How-Tos, Resources, Source code, Stack Overflow
INTRO TO TENSORFLOW
Intro to TensorFlow
What is Deep Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/INt1nULYPak.mp4
Solving Problems - Big and Small
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/WHcRQMGSbqg.mp4
Let’s Get Started
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ySIDqaXLhHw.mp4
Installing TensorFlow
Throughout this lesson, you’ll apply your knowledge of neural networks on real datasets using TensorFlow (link for China), an open source Deep Learning library created by Google.
You’ll use TensorFlow to classify images from the notMNIST dataset - a dataset of images of English letters from A to J. You can see a few example images below.
Your goal is to automatically detect the letter based on the image in the dataset. You’ll be working on your own computer for this lab, so, first things first, install TensorFlow!
Install
As usual, we’ll be using Conda to install TensorFlow. You might already have a TensorFlow environment, but check to make sure you have all the necessary packages.
OS X or Linux
Run the following commands to setup your environment:
conda create -n tensorflow python=3.5
source activate tensorflow
conda install pandas matplotlib jupyter notebook scipy scikit-learn
pip install tensorflow
Windows
And installing on Windows. In your console or Anaconda shell,
conda create -n tensorflow python=3.5
activate tensorflow
conda install pandas matplotlib jupyter notebook scipy scikit-learn
pip install tensorflow
Hello, world!
Try running the following code in your Python console to make sure you have TensorFlow properly installed. The console will print “Hello, world!” if TensorFlow is installed. Don’t worry about understanding what it does. You’ll learn about it in the next section.
import tensorflow as tf
# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')
with tf.Session() as sess:
# Run the tf.constant operation in the session
output = sess.run(hello_constant)
print(output)
Try
open cmd
with admin
conda create -n tensorflow python=3.5
C:\windows\system32>conda create -n tensorflow python=3.5
Fetching package metadata ...........
Solving package specifications: .
Package plan for installation in environment C:\Program Files\Anaconda2\envs\ten
sorflow:
The following NEW packages will be INSTALLED:
pip: 9.0.1-py35_1
python: 3.5.3-0
setuptools: 27.2.0-py35_1
vs2015_runtime: 14.0.25123-0
wheel: 0.29.0-py35_0
Proceed ([y]/n)? y
vs2015_runtime 100% |###############################| Time: 0:00:02 776.58 kB/s
python-3.5.3-0 100% |###############################| Time: 0:01:29 361.95 kB/s
setuptools-27. 100% |###############################| Time: 0:00:00 1.09 MB/s
wheel-0.29.0-p 100% |###############################| Time: 0:00:00 1.55 MB/s
pip-9.0.1-py35 100% |###############################| Time: 0:00:01 997.36 kB/s
#
# To activate this environment, use:
# > activate tensorflow
#
# To deactivate this environment, use:
# > deactivate tensorflow
#
# * for power-users using bash, you must source
#
activate tensorflow
(tensorflow) C:\windows\system32>
conda install pandas matplotlib jupyter notebook scipy scikit-learn
Fetching package metadata ………..
Solving package specifications: .
Package plan for installation in environment C:\Program Files\Anaconda2\envs\ten
sorflow:
The following NEW packages will be INSTALLED:
bleach: 1.5.0-py35_0
colorama: 0.3.7-py35_0
cycler: 0.10.0-py35_0
decorator: 4.0.11-py35_0
entrypoints: 0.2.2-py35_1
html5lib: 0.999-py35_0
icu: 57.1-vc14_0 [vc14]
ipykernel: 4.5.2-py35_0
ipython: 5.3.0-py35_0
ipython_genutils: 0.1.0-py35_0
ipywidgets: 6.0.0-py35_0
jinja2: 2.9.5-py35_0
jpeg: 9b-vc14_0 [vc14]
jsonschema: 2.5.1-py35_0
jupyter: 1.0.0-py35_3
jupyter_client: 5.0.0-py35_0
jupyter_console: 5.1.0-py35_0
jupyter_core: 4.3.0-py35_0
libpng: 1.6.27-vc14_0 [vc14]
markupsafe: 0.23-py35_2
matplotlib: 2.0.0-np112py35_0
mistune: 0.7.4-py35_0
mkl: 2017.0.1-0
nbconvert: 5.1.1-py35_0
nbformat: 4.3.0-py35_0
notebook: 4.4.1-py35_0
numpy: 1.12.0-py35_0
openssl: 1.0.2k-vc14_0 [vc14]
pandas: 0.19.2-np112py35_1
pandocfilters: 1.4.1-py35_0
path.py: 10.1-py35_0
pickleshare: 0.7.4-py35_0
prompt_toolkit: 1.0.13-py35_0
pygments: 2.2.0-py35_0
pyparsing: 2.1.4-py35_0
pyqt: 5.6.0-py35_2
python-dateutil: 2.6.0-py35_0
pytz: 2016.10-py35_0
pyzmq: 16.0.2-py35_0
qt: 5.6.2-vc14_3 [vc14]
qtconsole: 4.2.1-py35_2
scikit-learn: 0.18.1-np112py35_1
scipy: 0.19.0-np112py35_0
simplegeneric: 0.8.1-py35_1
sip: 4.18-py35_0
six: 1.10.0-py35_0
testpath: 0.3-py35_0
tk: 8.5.18-vc14_0 [vc14]
tornado: 4.4.2-py35_0
traitlets: 4.3.2-py35_0
wcwidth: 0.1.7-py35_0
widgetsnbextension: 2.0.0-py35_0
win_unicode_console: 0.5-py35_0
zlib: 1.2.8-vc14_3 [vc14]
Proceed ([y]/n)? y
mkl-2017.0.1-0 100% |###############################| Time: 0:04:46 470.85 kB/s
icu-57.1-vc14_ 100% |###############################| Time: 0:01:28 403.91 kB/s
jpeg-9b-vc14_0 100% |###############################| Time: 0:00:00 379.04 kB/s
openssl-1.0.2k 100% |###############################| Time: 0:00:13 393.72 kB/s
tk-8.5.18-vc14 100% |###############################| Time: 0:00:04 473.45 kB/s
zlib-1.2.8-vc1 100% |###############################| Time: 0:00:00 503.24 kB/s
colorama-0.3.7 100% |###############################| Time: 0:00:00 622.07 kB/s
decorator-4.0. 100% |###############################| Time: 0:00:00 690.00 kB/s
entrypoints-0. 100% |###############################| Time: 0:00:00 625.06 kB/s
ipython_genuti 100% |###############################| Time: 0:00:00 597.35 kB/s
jsonschema-2.5 100% |###############################| Time: 0:00:00 503.91 kB/s
libpng-1.6.27- 100% |###############################| Time: 0:00:01 432.48 kB/s
markupsafe-0.2 100% |###############################| Time: 0:00:00 520.82 kB/s
mistune-0.7.4- 100% |###############################| Time: 0:00:00 441.53 kB/s
numpy-1.12.0-p 100% |###############################| Time: 0:00:10 354.48 kB/s
pandocfilters- 100% |###############################| Time: 0:00:00 363.00 kB/s
path.py-10.1-p 100% |###############################| Time: 0:00:00 293.57 kB/s
pygments-2.2.0 100% |###############################| Time: 0:00:04 302.43 kB/s
pyparsing-2.1. 100% |###############################| Time: 0:00:00 270.85 kB/s
pytz-2016.10-p 100% |###############################| Time: 0:00:00 233.38 kB/s
pyzmq-16.0.2-p 100% |###############################| Time: 0:00:02 266.24 kB/s
simplegeneric- 100% |###############################| Time: 0:00:00 373.89 kB/s
sip-4.18-py35_ 100% |###############################| Time: 0:00:00 268.95 kB/s
six-1.10.0-py3 100% |###############################| Time: 0:00:00 409.00 kB/s
testpath-0.3-p 100% |###############################| Time: 0:00:00 329.72 kB/s
tornado-4.4.2- 100% |###############################| Time: 0:00:02 253.88 kB/s
wcwidth-0.1.7- 100% |###############################| Time: 0:00:00 329.53 kB/s
win_unicode_co 100% |###############################| Time: 0:00:00 302.28 kB/s
cycler-0.10.0- 100% |###############################| Time: 0:00:00 393.21 kB/s
html5lib-0.999 100% |###############################| Time: 0:00:00 260.77 kB/s
jinja2-2.9.5-p 100% |###############################| Time: 0:00:01 250.23 kB/s
pickleshare-0. 100% |###############################| Time: 0:00:00 326.15 kB/s
prompt_toolkit 100% |###############################| Time: 0:00:01 281.79 kB/s
python-dateuti 100% |###############################| Time: 0:00:00 280.81 kB/s
qt-5.6.2-vc14_ 100% |###############################| Time: 0:02:03 469.10 kB/s
scipy-0.19.0-n 100% |###############################| Time: 0:00:20 656.15 kB/s
traitlets-4.3. 100% |###############################| Time: 0:00:00 418.63 kB/s
bleach-1.5.0-p 100% |###############################| Time: 0:00:00 508.29 kB/s
ipython-5.3.0- 100% |###############################| Time: 0:00:02 406.32 kB/s
jupyter_core-4 100% |###############################| Time: 0:00:00 365.87 kB/s
pandas-0.19.2- 100% |###############################| Time: 0:00:13 548.51 kB/s
pyqt-5.6.0-py3 100% |###############################| Time: 0:00:08 586.14 kB/s
scikit-learn-0 100% |###############################| Time: 0:00:16 282.73 kB/s
jupyter_client 100% |###############################| Time: 0:00:00 250.90 kB/s
matplotlib-2.0 100% |###############################| Time: 0:00:17 508.36 kB/s
nbformat-4.3.0 100% |###############################| Time: 0:00:00 1.41 MB/s
ipykernel-4.5. 100% |###############################| Time: 0:00:00 1.39 MB/s
nbconvert-5.1. 100% |###############################| Time: 0:00:00 1.42 MB/s
jupyter_consol 100% |###############################| Time: 0:00:00 397.64 kB/s
notebook-4.4.1 100% |###############################| Time: 0:00:06 890.12 kB/s
qtconsole-4.2. 100% |###############################| Time: 0:00:00 705.98 kB/s
widgetsnbexten 100% |###############################| Time: 0:00:01 727.40 kB/s
ipywidgets-6.0 100% |###############################| Time: 0:00:00 632.13 kB/s
jupyter-1.0.0- 100% |###############################| Time: 0:00:00 665.76 kB/s
ERROR conda.core.link:_execute_actions(330): An error occurred while installing
package 'defaults::qt-5.6.2-vc14_3'.
UnicodeDecodeError('utf8', '\xd2\xd1\xb8\xb4\xd6\xc6 1 \xb8\xf6\xce\xc4\
xbc\xfe\xa1\xa3\r\n', 0, 1, 'invalid continuation byte')
Attempting to roll back.
UnicodeDecodeError('utf8', '\xd2\xd1\xb8\xb4\xd6\xc6 1 \xb8\xf6\xce\xc4\
xbc\xfe\xa1\xa3\r\n', 0, 1, 'invalid continuation byte')
(tensorflow) C:\windows\system32>pip install tensorflow
Hello, Tensor World!
Hello, Tensor World!
Let’s analyze the Hello World script you ran. For reference, I’ve added the code below.
import tensorflow as tf
# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')
with tf.Session() as sess:
# Run the tf.constant operation in the session
output = sess.run(hello_constant)
print(output)
Tensor
In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated(封装) in an object called a tensor. In the case of hello_constant = tf.constant('Hello World!')
, hello_constant
is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:
# A is a 0-dimensional int32 tensor
A = tf.constant(1234)
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789])
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])
tf.constant()
is one of many TensorFlow operations you will use in this lesson. The tensor returned by tf.constant()
is called a constant tensor, because the value of the tensor never changes.
Session
TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a mathematical process which you learned about in the MiniFlow lesson. Let’s take the TensorFlow code you ran and turn that into a graph:
A “TensorFlow Session”, as shown above, is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines. Let’s see how you use it.
with tf.Session() as sess:
output = sess.run(hello_constant)
The code has already created the tensor, hello_constant
, from the previous lines. The next step is to evaluate the tensor in a session.
The code creates a session instance, sess
, using tf.Session
. The sess.run()
function then evaluates the tensor and returns the results.
Quiz: TensorFlow Input
Input
In the last section, you passed a tensor into a session and it returned the result. What if you want to use a non-constant? This is where tf.placeholder()
and feed_dict
come into place. In this section, you’ll go over the basics of feeding data into TensorFlow.
tf.placeholder()
Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time you’ll want your TensorFlow model to take in different datasets with different parameters. You need tf.placeholder()
!
tf.placeholder()
returns a tensor that gets its value from data passed to the tf.session.run()
function, allowing you to set the input right before the session runs.
Session’s feed_dict
x = tf.placeholder(tf.string)
with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Hello World'})
Use the feed_dict
parameter in tf.session.run()
to set the placeholder tensor. The above example shows the tensor x
being set to the string "Hello, world"
. It’s also possible to set more than one tensor using feed_dict
as shown below.
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)
with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})
Note: If the data passed to the feed_dict
doesn’t match the tensor type and can’t be cast into the tensor type, you’ll get the error “ValueError: invalid literal for...”
.
Quiz
Let’s see how well you understand tf.placeholder()
and feed_dict
. The code below throws an error, but I want you to make it return the number 123
. Change line 11, so that the code returns the number 123
.
Note: The quizzes are running TensorFlow version 0.12.1. However, all the code used in this course is compatible with version 1.0. We’ll be upgrading our in class quizzes to the newest version in the near future.
# Solution is available in the other "solution.py" tab
import tensorflow as tf
def run():
output = None
x = tf.placeholder(tf.int32)
with tf.Session() as sess:
# TODO: Feed the x tensor 123
output = sess.run(x,feed_dict={x:123})
return output
Quiz: TensorFlow Math
TensorFlow Math
Getting the input is great, but now you need to use it. You’re going to use basic math functions that everyone knows and loves - add, subtract, multiply, and divide - with tensors. (There’s many more math functions you can check out in the documentation.)
Addition
x = tf.add(5, 2) # 7
You’ll start with the add function. The tf.add()
function does exactly what you expect it to do. It takes in two numbers, two tensors, or one of each, and returns their sum as a tensor.
Subtraction and Multiplication
Here’s an example with subtraction and multiplication.
x = tf.subtract(10, 4) # 6
y = tf.multiply(2, 5) # 10
The x
tensor will evaluate to 6
, because 10 - 4 = 6
. The y
tensor will evaluate to 10
, because 2 * 5 = 10
. That was easy!
Converting types
It may be necessary to convert between types to make certain operators work together. For example, if you tried the following, it would fail with an exception:
tf.subtract(tf.constant(2.0),tf.constant(1)) # Fails with ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int32:
That’s because the constant 1
is an integer but the constant 2.0
is a floating point value and subtract
expects them to match.
In cases like these, you can either make sure your data is all of the same type, or you can cast a value to another type. In this case, converting the 2.0 to an integer before subtracting, like so, will give the correct result:
tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1)) # 1
Quiz
Let’s apply what you learned to convert an algorithm to TensorFlow. The code below is a simple algorithm using division and subtraction. Convert the following algorithm in regular Python to TensorFlow and print the results of the session. You can use tf.constant()
for the values 10
, 2
, and 1
.
# Solution is available in the other "solution.py" tab
import tensorflow as tf
# TODO: Convert the following to TensorFlow:
x = 10
y = 2
z = x/y - 1
x=tf.constant(x)
y=tf.constant(y)
z=tf.constant(z)
#z=tf.subtract(tf.divide(x,y),tf.cast(tf.constant(1),tf.float64))
# TODO: Print z from a session
with tf.Session() as sess:
output = sess.run(z)
print(output)
Transition to Classification
Good job! You’ve accomplished a lot. In particular, you did the following:
- Ran operations in
tf.Session
. - Created a constant tensor with
tf.constant()
. - Used
tf.placeholder()
andfeed_dict
to get input. - Applied the
tf.add()
,tf.subtract()
,tf.multiply()
, andtf.divide()
functions using numeric data. - Learned about casting between types with
tf.cast()
You know the basics of TensorFlow, so let’s take a break and get back to the theory of neural networks. In the next few videos, you’re going to learn about one of the most popular applications of neural networks - classification.
Supervised Classification
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XTGsutypAPE.mp4
Training Your Logistic Classifier
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/WQsdr1EJgz8.mp4
Quiz: TensorFlow Linear Function
Linear functions in TensorFlow
The most common operation in neural networks is calculating the linear combination of inputs, weights, and biases. As a reminder, we can write the output of the linear operation as
Here, W is a matrix of the weights connecting two layers. The output y, the input x, and the biases b are all vectors.
Weights and Bias in TensorFlow
The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you’ll need a Tensor that can be modified. This leaves out tf.placeholder()
and tf.constant()
, since those Tensors can’t be modified. This is where tf.Variable
class comes in.
tf.Variable()
x = tf.Variable(5)
The tf.Variable
class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You’ll use the tf.global_variables_initializer()
function to initialize the state of all the Variable tensors.
Initialization
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
The tf.global_variables_initializer()
call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable
class allows us to change the weights and bias, but an initial value needs to be chosen.
Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. You’ll learn more about this in the next lesson, when you study gradient descent.
Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You’ll use the tf.truncated_normal()
function to generate random numbers from a normal distribution.
tf.truncated_normal()
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
The tf.truncated_normal()
function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.
Since the weights are already helping prevent the model from getting stuck, you don’t need to randomize the bias. Let’s use the simplest solution, setting the bias to 0.
tf.zeros()
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))
The tf.zeros()
function returns a tensor with all zeros.
Linear Classifier Quiz
You’ll be classifying the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. The above is a small sample of the data you’ll be training on. Notice how some of the 1s are written with a serif at the top and at different angles. The similarities and differences will play a part in shaping the weights of the model.
The images above are trained weights for each label (0, 1, and 2). The weights display the unique properties of each digit they have found. Complete this quiz to train your own weights using the MNIST dataset.
Instructions
- Open quiz.py.
- Implement
get_weights
to return atf.Variable
of weights - Implement
get_biases
to return atf.Variable
of biases - Implement
xW + b
in thelinear
function
- Implement
- Open sandbox.py
- Initialize all weights
SincexW
inxW + b
is matrix multiplication, you have to use thetf.matmul()
function instead oftf.multiply()
. Don’t forget that order matters in matrix multiplication, sotf.matmul(a,b)
is not the same astf.matmul(b,a)
.
- Initialize all weights
quiz.py
# Solution is available in the other "quiz_solution.py" tab
import tensorflow as tf
def get_weights(n_features, n_labels):
"""
Return TensorFlow weights
:param n_features: Number of features
:param n_labels: Number of labels
:return: TensorFlow weights
"""
# TODO: Return weights
return tf.Variable(tf.truncated_normal((n_features, n_labels)))
def get_biases(n_labels):
"""
Return TensorFlow bias
:param n_labels: Number of labels
:return: TensorFlow bias
"""
# TODO: Return biases
return tf.Variable(tf.zeros(n_labels))
def linear(input, w, b):
"""
Return linear function in TensorFlow
:param input: TensorFlow input
:param w: TensorFlow weights
:param b: TensorFlow biases
:return: TensorFlow linear function
"""
# TODO: Linear Function (xW + b)
return tf.add(tf.matmul(input,w),b)
sandbox.py
# Solution is available in the other "sandbox_solution.py" tab
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from quiz import get_weights, get_biases, linear
def mnist_features_labels(n_labels):
"""
Gets the first <n> labels from the MNIST dataset
:param n_labels: Number of labels to use
:return: Tuple of feature list and label list
"""
mnist_features = []
mnist_labels = []
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)
# In order to make quizzes run faster, we're only looking at 10000 images
for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):
# Add features and labels if it's for the first <n>th labels
if mnist_label[:n_labels].any():
mnist_features.append(mnist_feature)
mnist_labels.append(mnist_label[:n_labels])
return mnist_features, mnist_labels
# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3
# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)
# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)
# Linear Function xW + b
logits = linear(features, w, b)
# Training data
train_features, train_labels = mnist_features_labels(n_labels)
with tf.Session() as session:
# TODO: Initialize session variables
session.run(tf.global_variables_initializer())
# Softmax
prediction = tf.nn.softmax(logits)
# Cross entropy
# This quantifies how far off the predictions were.
# You'll learn more about this in future lessons.
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)
# Training loss
# You'll learn more about this in future lessons.
loss = tf.reduce_mean(cross_entropy)
# Rate at which the weights are changed
# You'll learn more about this in future lessons.
learning_rate = 0.08
# Gradient Descent
# This is the method used to train the model
# You'll learn more about this in future lessons.
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# Run optimizer and get loss
_, l = session.run(
[optimizer, loss],
feed_dict={features: train_features, labels: train_labels})
# Print loss
print('Loss: {}'.format(l))
Quiz: TensorFlow Softmax
TensorFlow Softmax
You might remember in the Intro to TFLearn lesson we used the softmax function to calculate class probabilities as output from the network. The softmax function squashes it’s inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It’s the perfect function to use as the output activation for a network predicting multiple classes.
TensorFlow Softmax
We’re using TensorFlow to build neural networks and, appropriately, there’s a function for calculating softmax.
x = tf.nn.softmax([2.0, 1.0, 0.2])
Easy as that! tf.nn.softmax()
implements the softmax function for you. It takes in logits and returns softmax activations.
Quiz
Use the softmax function in the quiz below to return the softmax of the logits.
quiz.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf
def run():
output = None
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
# TODO: Calculate the softmax of the logits
# softmax =
softmax = tf.nn.softmax([2.0, 1.0, 0.1])
with tf.Session() as sess:
# TODO: Feed in the logit data
# output = sess.run(softmax, )
output = sess.run(softmax,feed_dict={logits:logit_data} )
return output
One-Hot Encoding
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/phYsxqlilUk.mp4
13 L One Hot Encoding
One-Hot Encoding With Scikit-Learn
Transforming your labels into one-hot encoded vectors is pretty simple with scikit-learn using LabelBinarizer
. Check it out below!
import numpy as np
from sklearn import preprocessing
# Example labels
labels = np.array([1,5,3,2,1,4,2,1,3])
# Create the encoder
lb = preprocessing.LabelBinarizer()
# Here the encoder finds the classes and assigns one-hot vectors
lb.fit(labels)
# And finally, transform the labels into one-hot encoded vectors
lb.transform(labels)
>>> array([[1, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0]])
Quiz: TensorFlow Cross Entropy
Cross Entropy in TensorFlow
In the Intro to TFLearn lesson we discussed using cross entropy as the cost function for classification with one-hot encoded labels. Again, TensorFlow has a function to do the cross entropy calculations for us.
Let’s take what you learned from the video and create a cross entropy function in TensorFlow. To create a cross entropy function in TensorFlow, you’ll need to use two new functions:
Reduce Sum
x = tf.reduce_sum([1, 2, 3, 4, 5]) # 15
The tf.reduce_sum()
function takes an array of numbers and sums them together.
Natural Log
x = tf.log(100) # 4.60517
This function does exactly what you would expect it to do. tf.log()
takes the natural log of a number.
Quiz
Print the cross entropy using softmax_data
and one_hot_encod_label
.
(Alternative link for users in China.)
quiz.py
# Solution is available in the other "solution.py" tab
import tensorflow as tf
softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]
softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)
# TODO: Print cross entropy from session
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))
with tf.Session() as sess:
print(sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data}))
0.356675
Minimizing Cross Entropy
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YrDMXFhvh9E.mp4
Transition into Practical Aspects of Learning
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bKqkRFOOKoA.mp4
Quiz: Numerical Stability
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_SbGcOS-jcQ.mp4
a = 1000000000
for i in range(1000000):
a = a + 1e-6
print(a - 1000000000)
0.953674316406
Normalized Inputs and Initial Weights
Measuring Performance
Optimizing a Logistic Classifier
Stochastic Gradient Descent
Momentum and Learning Rate Decay
Parameter Hyperspace
Quiz: Mini-batch
Mini-batching
In this section, you’ll go over what mini-batching is and how to apply it in TensorFlow.
Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.
Mini-batching is computationally inefficient, since you can’t calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.
It’s also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you’re performing SGD with each batch.
Let’s look at the MNIST dataset with weights and a bias to see if your machine can handle it.
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)
# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images
train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
Question 1
Calculate the memory size of train_features
, train_labels
, weights
, and bias
in bytes. Ignore memory for overhead, just calculate the memory required for the stored data.
You may have to look up how much memory a float32 requires, using this link.
train_features Shape: (55000, 784) Type: float32
train_labels Shape: (55000, 10) Type: float32
weights Shape: (784, 10) Type: float32
bias Shape: (10,) Type: float32
How many bytes of memory does train_features need?
550007844=172480000
How many bytes of memory does train_labels need?
2200000
How many bytes of memory does weights need?
31360
How many bytes of memory does bias need?
40
The total memory space required for the inputs, weights and bias is around 174 megabytes, which isn’t that much memory. You could train this whole dataset on most CPUs and GPUs.
But larger datasets that you’ll use in the future measured in gigabytes or more. It’s possible to purchase more memory, but it’s expensive. A Titan X GPU with 12 GB of memory costs over $1,000.
Instead, in order to run large models on your machine, you’ll learn how to use mini-batching.
Let’s look at how you implement mini-batching in TensorFlow.
TensorFlow Mini-batching
In order to use mini-batching, you must first divide your data into batches.
Unfortunately, it’s sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you’d like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you’d wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7128 + 1104 = 1000)
In that case, the size of the batches would vary, so you need to take advantage of TensorFlow’s tf.placeholder()
function to receive the varying batch sizes.
Continuing the example, if each sample had n_input = 784
features and n_classes = 10
possible labels, the dimensions for features
would be [None, n_input]
and labels
would be [None, n_classes]
.
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
What does None
do here?
The None
dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.
Going back to our earlier example, this setup allows you to feed features
and labels
into the model as either the batches of 128 samples or the single batch of 104 samples.
Question 2
Use the parameters below, how many batches are there, and what is the last batch size?
features is (50000, 400)
labels is (50000, 10)
batch_size is 128
How many batches are there?
50000/128+1=391
What is the last batch size?
50000%128=80
Now that you know the basics, let’s learn how to implement mini-batching.
Question 3
Implement the batches
function to batch features
and labels
. The function should return each batch with a maximum size of batch_size
. To help you with the quiz, look at the following example output of a working batches
function.
# 4 Samples of features
example_features = [
['F11','F12','F13','F14'],
['F21','F22','F23','F24'],
['F31','F32','F33','F34'],
['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
['L11','L12'],
['L21','L22'],
['L31','L32'],
['L41','L42']]
example_batches = batches(3, example_features, example_labels)
The example_batches
variable would be the following:
[
# 2 batches:
# First is a batch of size 3.
# Second is a batch of size 1
[
# First Batch is size 3
[
# 3 samples of features.
# There are 4 features per sample.
['F11', 'F12', 'F13', 'F14'],
['F21', 'F22', 'F23', 'F24'],
['F31', 'F32', 'F33', 'F34']
], [
# 3 samples of labels.
# There are 2 labels per sample.
['L11', 'L12'],
['L21', 'L22'],
['L31', 'L32']
]
], [
# Second Batch is size 1.
# Since batch size is 3, there is only one sample left from the 4 samples.
[
# 1 sample of features.
['F41', 'F42', 'F43', 'F44']
], [
# 1 sample of labels.
['L41', 'L42']
]
]
]
Implement the batches
function in the “quiz.py” file below.
“quiz.py”
import math
def batches(batch_size, features, labels):
"""
Create batches of features and labels
:param batch_size: The batch size
:param features: List of features
:param labels: List of labels
:return: Batches of (Features, Labels)
"""
assert len(features) == len(labels)
# TODO: Implement batching
output_batches = []
sample_size = len(features)
for start_i in range(0, sample_size, batch_size):
end_i = start_i + batch_size
batch = [features[start_i:end_i], labels[start_i:end_i]]
output_batches.append(batch)
return output_batches
“sandbox.py”
from quiz import batches
from pprint import pprint
# 4 Samples of features
example_features = [
['F11','F12','F13','F14'],
['F21','F22','F23','F24'],
['F31','F32','F33','F34'],
['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
['L11','L12'],
['L21','L22'],
['L31','L32'],
['L41','L42']]
# PPrint prints data structures like 2d arrays, so they are easier to read
pprint(batches(3, example_features, example_labels))
Let’s use mini-batching to feed batches of MNIST features and labels into a linear model.
Set the batch size and run the optimizer over all the batches with the batches
function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.
“quiz.py”
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)
# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images
train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
# TODO: Train optimizer on all batches
# for batch_features, batch_labels in ______
for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})
# Calculate accuracy for test dataset
test_accuracy = sess.run(
accuracy,
feed_dict={features: test_features, labels: test_labels})
print('Test Accuracy: {}'.format(test_accuracy))
The accuracy is low, but you probably know that you could train on the dataset more than once. You can train a model using the dataset multiple times. You’ll go over this subject in the next section where we talk about “epochs”.
Epochs
Epochs
An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.
The following TensorFlow code trains a model using 10 epochs.
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches # Helper function created in Mini-batching section
def print_epoch_stats(epoch_i, sess, last_features, last_labels):
"""
Print cost and validation accuracy of an epoch
"""
current_cost = sess.run(
cost,
feed_dict={features: last_features, labels: last_labels})
valid_accuracy = sess.run(
accuracy,
feed_dict={features: valid_features, labels: valid_labels})
print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
epoch_i,
current_cost,
valid_accuracy))
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)
# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images
train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
init = tf.global_variables_initializer()
batch_size = 128
epochs = 10
learn_rate = 0.001
train_batches = batches(batch_size, train_features, train_labels)
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch_i in range(epochs):
# Loop over all batches
for batch_features, batch_labels in train_batches:
train_feed_dict = {
features: batch_features,
labels: batch_labels,
learning_rate: learn_rate}
sess.run(optimizer, feed_dict=train_feed_dict)
# Print cost and validation accuracy of an epoch
print_epoch_stats(epoch_i, sess, batch_features, batch_labels)
# Calculate accuracy for test dataset
test_accuracy = sess.run(
accuracy,
feed_dict={features: test_features, labels: test_labels})
print('Test Accuracy: {}'.format(test_accuracy))
Running the code will output the following:
Epoch: 0 - Cost: 11.0 Valid Accuracy: 0.204
Epoch: 1 - Cost: 9.95 Valid Accuracy: 0.229
Epoch: 2 - Cost: 9.18 Valid Accuracy: 0.246
Epoch: 3 - Cost: 8.59 Valid Accuracy: 0.264
Epoch: 4 - Cost: 8.13 Valid Accuracy: 0.283
Epoch: 5 - Cost: 7.77 Valid Accuracy: 0.301
Epoch: 6 - Cost: 7.47 Valid Accuracy: 0.316
Epoch: 7 - Cost: 7.2 Valid Accuracy: 0.328
Epoch: 8 - Cost: 6.96 Valid Accuracy: 0.342
Epoch: 9 - Cost: 6.73 Valid Accuracy: 0.36
Test Accuracy: 0.3801000118255615
Each epoch attempts to move to a lower cost, leading to better accuracy.
This model continues to improve accuracy up to Epoch 9. Let’s increase the number of epochs to 100.
...
Epoch: 79 - Cost: 0.111 Valid Accuracy: 0.86
Epoch: 80 - Cost: 0.11 Valid Accuracy: 0.869
Epoch: 81 - Cost: 0.109 Valid Accuracy: 0.869
....
Epoch: 85 - Cost: 0.107 Valid Accuracy: 0.869
Epoch: 86 - Cost: 0.107 Valid Accuracy: 0.869
Epoch: 87 - Cost: 0.106 Valid Accuracy: 0.869
Epoch: 88 - Cost: 0.106 Valid Accuracy: 0.869
Epoch: 89 - Cost: 0.105 Valid Accuracy: 0.869
Epoch: 90 - Cost: 0.105 Valid Accuracy: 0.869
Epoch: 91 - Cost: 0.104 Valid Accuracy: 0.869
Epoch: 92 - Cost: 0.103 Valid Accuracy: 0.869
Epoch: 93 - Cost: 0.103 Valid Accuracy: 0.869
Epoch: 94 - Cost: 0.102 Valid Accuracy: 0.869
Epoch: 95 - Cost: 0.102 Valid Accuracy: 0.869
Epoch: 96 - Cost: 0.101 Valid Accuracy: 0.869
Epoch: 97 - Cost: 0.101 Valid Accuracy: 0.869
Epoch: 98 - Cost: 0.1 Valid Accuracy: 0.869
Epoch: 99 - Cost: 0.1 Valid Accuracy: 0.869
Test Accuracy: 0.8696000006198883
From looking at the output above, you can see the model doesn’t increase the validation accuracy after epoch 80. Let’s see what happens when we increase the learning rate.
learn_rate = 0.1
Epoch: 76 - Cost: 0.214 Valid Accuracy: 0.752
Epoch: 77 - Cost: 0.21 Valid Accuracy: 0.756
Epoch: 78 - Cost: 0.21 Valid Accuracy: 0.756
...
Epoch: 85 - Cost: 0.207 Valid Accuracy: 0.756
Epoch: 86 - Cost: 0.209 Valid Accuracy: 0.756
Epoch: 87 - Cost: 0.205 Valid Accuracy: 0.756
Epoch: 88 - Cost: 0.208 Valid Accuracy: 0.756
Epoch: 89 - Cost: 0.205 Valid Accuracy: 0.756
Epoch: 90 - Cost: 0.202 Valid Accuracy: 0.756
Epoch: 91 - Cost: 0.207 Valid Accuracy: 0.756
Epoch: 92 - Cost: 0.204 Valid Accuracy: 0.756
Epoch: 93 - Cost: 0.206 Valid Accuracy: 0.756
Epoch: 94 - Cost: 0.202 Valid Accuracy: 0.756
Epoch: 95 - Cost: 0.2974 Valid Accuracy: 0.756
Epoch: 96 - Cost: 0.202 Valid Accuracy: 0.756
Epoch: 97 - Cost: 0.2996 Valid Accuracy: 0.756
Epoch: 98 - Cost: 0.203 Valid Accuracy: 0.756
Epoch: 99 - Cost: 0.2987 Valid Accuracy: 0.756
Test Accuracy: 0.7556000053882599
Looks like the learning rate was increased too much. The final accuracy was lower, and it stopped improving earlier. Let’s stick with the previous learning rate, but change the number of epochs to 80.
Epoch: 65 - Cost: 0.122 Valid Accuracy: 0.868
Epoch: 66 - Cost: 0.121 Valid Accuracy: 0.868
Epoch: 67 - Cost: 0.12 Valid Accuracy: 0.868
Epoch: 68 - Cost: 0.119 Valid Accuracy: 0.868
Epoch: 69 - Cost: 0.118 Valid Accuracy: 0.868
Epoch: 70 - Cost: 0.118 Valid Accuracy: 0.868
Epoch: 71 - Cost: 0.117 Valid Accuracy: 0.868
Epoch: 72 - Cost: 0.116 Valid Accuracy: 0.868
Epoch: 73 - Cost: 0.115 Valid Accuracy: 0.868
Epoch: 74 - Cost: 0.115 Valid Accuracy: 0.868
Epoch: 75 - Cost: 0.114 Valid Accuracy: 0.868
Epoch: 76 - Cost: 0.113 Valid Accuracy: 0.868
Epoch: 77 - Cost: 0.113 Valid Accuracy: 0.868
Epoch: 78 - Cost: 0.112 Valid Accuracy: 0.868
Epoch: 79 - Cost: 0.111 Valid Accuracy: 0.868
Epoch: 80 - Cost: 0.111 Valid Accuracy: 0.869
Test Accuracy: 0.86909999418258667
The accuracy only reached 0.86, but that could be because the learning rate was too high. Lowering the learning rate would require more epochs, but could ultimately achieve better accuracy.
In the upcoming TensorFLow Lab, you’ll get the opportunity to choose your own learning rate, epoch count, and batch size to improve the model’s accuracy.
More about epoch in Quora.
INTRO TO NEURAL NETWORKS
Intro to Neural Networks
Introducing Luis
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nto-stLuN6M.mp4
Logistic Regression Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kSs6O3R7JUI.mp4
Logistic Regression Answer
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1iNylA3fJDs.mp4
Neural Networks
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Mqogpnp1lrU.mp4
Perceptron
Perceptron
Now you’ve seen how a simple neural network makes decisions: by taking in input data, processing that information, and finally, producing an output in the form of a decision! Let’s take a deeper dive into the university admission example and learn more about how this input data is processed.
Data, like test scores and grades, is fed into a network of interconnected nodes. These individual nodes are called perceptrons or neurons, and they are the basic unit of a neural network. Each one looks at input data and decides how to categorize that data. In the example above, the input either passes a threshold for grades and test scores or doesn’t, and so the two categories are: yes (passed the threshold) and no (didn’t pass the threshold). These categories then combine to form a decision – for example, if both nodes produce a “yes” output, then this student gains admission into the university.
Let’s zoom in even further and look at how a single perceptron processes input data.
The perceptron above is one of the two perceptrons from the video that help determine whether or not a student is accepted to a university. It decides whether a student’s grades are high enough to be accepted to the university. You might be wondering: “How does it know whether grades or test scores are more important in making this acceptance decision?” Well, when we initialize a neural network, we don’t know what information will be most important in making a decision. It’s up to the neural network to learn for itself which data is most important and adjust how it considers that data.
It does this with something called weights.
Weights
When input data comes into a perceptron, it gets multiplied by a weight value that is assigned to this particular input. For example, the perceptron above have two inputs, tests
for test scores and grades
, so it has two associated weights that can be adjusted individually. These weights start out as random values, and as the neural network learns more about what kind of input data leads to a student being accepted into a university, the network adjusts the weights based on any errors in categorization that the previous weights resulted in. This is called training the neural network.
A higher weight means the neural network considers that input more important than other inputs, and lower weight means that the data is considered less important. An extreme example would be if test scores had no affect at all on university acceptance; then the weight of the test score input data would be zero and it would have no affect on the output of the perceptron.
Summing the Input Data
So, each input to a perceptron has an associated weight that represents its importance and these weights are determined during the learning process of a neural network, called training. In the next step, the weighted input data is summed up to produce a single value, that will help determine the final output - whether a student is accepted to a university or not. Let’s see a concrete example of this.
When writing equations related to neural networks, the weights will always be represented by some type of the letter w. It will usually look like a W when it represents a matrix of weights or a w when it represents an individual weight, and it may include some additional information in the form of a subscript to specify which weights (you’ll see more on that next). But remember, when you see the letter w, think weights.
In this example, we’ll use $w{grades}$ for the weight of grades
and $w{test}$ for the weight of test
. For the image above, let’s say that the weights are: $w{grades}=-1$,$w{test}=−0.2$. You don’t have to be concerned with the actual values, but their relative values are important. $w{grades}$ is 5 times larger than $w{test}$, which means the neural network considers grades
input 5 times more important than test
in determining whether a student will be accepted into a university.
The perceptron applies these weights to the inputs and sums them in a process known as linear combination. In our case, this looks like $$w{grades}x{grades}+w{test}x{test}=-1x{grades}-0.2x{test}$$.
Now, to make our equation less wordy, let’s replace the explicit names with numbers. Let’s use 1 for grades and 2 for tests. So now our equation becomes $$w{1}*x{1}+w{1}*x{1}$$.
In this example, we just have 2 simple inputs: grades and tests. Let’s imagine we instead had m different inputs and we labeled them $x{1},x{2}…x{m}$. Let’s also say that the weight corresponding to $x{1}$ is $w_{1}$ and so on. In that case, we would express the linear combination succintly as:
$$\Sigma1^mw{i}*x{i}$$
Here, the Greek letter Sigma $\Sigma$ is used to represent summation. It simply means to evaluate the equation to the right multiple times and add up the results. In this case, the equation it will sum is $w{i}*x_{i}$
But where do we get $w{i}$ and $x{i}$ ?
$\Sigma_1^m$ means to iterate over all i values, from 1 to m.
So to put it all together, $\Sigma1^mw{i}*x_{i}$ means the following:
- Start at i=1
- Evaluate $w{1}*x{1}$ and remember the results
- Move to i=2
- Evaluate $w{2}*x{2}$ and add these results to $w{1}*x{1}$
- Continue repeating that process until i=m, where m is the number of inputs.
One last thing: you’ll see equations written many different ways, both here and when reading on your own. For example, you will often just see $\Sigmai$ instead of $\Sigma{i=1}^m$. The first is simply a shorter way of writing the second. That is, if you see a summation without a starting number or a defined end value, it just means perform the sum for all of the them. And sometimes, if the value to iterate over can be inferred, you’ll see it as just $\Sigma$. Just remember they’re all the same thing: $\Sigma{i=1}^m w{i}*x_{i} = \Sigmai w{i}*x{i} = \Sigma w{i}*x_{i}$.
Calculating the Output with an Activation Function
Finally, the result of the perceptron’s summation is turned into an output signal! This is done by feeding the linear combination into an activation function.
Activation functions are functions that decide, given the inputs into the node, what should be the node’s output? Because it’s the activation function that decides the actual output, we often refer to the outputs of a layer as its “activations”.
One of the simplest activation functions is the Heaviside step function. This function returns a 0 if the linear combination is less than 0. It returns a 1 if the linear combination is positive or equal to zero. The Heaviside step function is shown below, where h is the calculated linear combination:
In the university acceptance example above, we used the weights $w{grades} = -1$, $w{test} = −0.2$. Since $w{grades}$ and $w{test}$ are negative values, the activation function will only return a 1 if grades and test are 0! This is because the range of values from the linear combination using these weights and inputs are (−∞,0] (i.e. negative infinity to 0, including 0 itself).
It’s easiest to see this with an example in two dimensions. In the following graph, imagine any points along the line or in the shaded area represent all the possible inputs to our node. Also imagine that the value along the y-axis is the result of performing the linear combination on these inputs and the appropriate weights. It’s this result that gets passed to the activation function.
Now remember that the step activation function returns 1 for any inputs greater than or equal to zero. As you can see in the image, only one point has a y-value greater than or equal to zero – the point right at the origin, (0,0):
Now, we certainly want more than one possible grade/test combination to result in acceptance, so we need to adjust the results passed to our activation function so it activates – that is, returns 1 – for more inputs. Specifically, we need to find a way so all the scores we’d like to consider acceptable for admissions produce values greater than or equal to zero when linearly combined with the weights into our node.
One way to get our function to return 1 for more inputs is to add a value to the results of our linear combination, called a bias.
A bias, represented in equations as b, lets us move values in one direction or another.
For example, the following diagram shows the previous hypothetical function with an added bias of +3. The blue shaded area shows all the values that now activate the function. But notice that these are produced with the same inputs as the values shown shaded in grey – just adjusted higher by adding the bias term:
Of course, with neural networks we won’t know in advance what values to pick for biases. That’s ok, because just like the weights, the bias can also be updated and changed by the neural network during training. So after adding a bias, we now have a complete perceptron formula:
This formula returns 1 if the input $x{1},x{2}…x_{m}$ belongs to the accepted-to-university category or returns 0 if it doesn’t. The input is made up of one or more real numbers, each one represented by $x_{i}$, where m is the number of inputs.
Then the neural network starts to learn! Initially, the weights $w_{i}$ and bias (b) are assigned a random value, and then they are updated using a learning algorithm like gradient descent. The weights and biases change so that the next training example is more accurately categorized, and patterns in data are “learned” by the neural network.
Now that you have a good understanding of perceptions, let’s put that knowledge to use. In the next section, you’ll create the AND perceptron from the Neural Networks video by setting the values for weights and bias.
AND Perceptron Quiz
What are the weights and bias for the AND perceptron?
Set the weights (weight1, weight2)
and bias
bias to the correct values that calculate AND operation as shown above.
In this case, there are two inputs as seen in the table above (let’s call the first column input1
and the second column input2
), and based on the perceptron formula, we can calculate the output.
First, the linear combination will be the sum of the weighted inputs: linear_combination = weight1*input1 + weight2*input2
then we can put this value into the biased Heaviside step function, which will give us our output (0 or 1):
If you still need a hint, think of a concrete example like so:
Consider input1 and input2 both = 1, for an AND perceptron, we want the output to also equal 1! The output is determined by the weights and Heaviside step function such that
output = 1, if weight1*input1 + weight2*input2 + bias >= 0
or
output = 0, if weight1*input1 + weight2*input2 + bias < 0
So, how can you choose the values for weights and bias so that if both inputs = 1, the output = 1?
Gradient Descent
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/29PmNG7fuuM.mp4
Gradient is another term for rate of change or slope. If you need to brush up on this concept, check out Khan Academy’s great lectures on the topic.
Gradient Descent: The Math
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7sxA5Ap8AWM.mp4
Gradient Descent: The Code
Implementing Gradient Descent
Multilayer Perceptrons
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Rs9petvTBLk.mp4
Khan Academy’s introduction to vectors.
Khan Academy’s introduction to matrices.
Backpropagation
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/MZL97-2joxQ.mp4
Implementing Backpropagation
Further Reading
From Andrej Karpathy: Yes, you should understand backprop
Also from Andrej Karpathy, a lecture from Stanford’s CS231n course
DEEP NEURAL NETWORKS
Two-Layer Neural Network
Multilayer Neural Networks
In this lesson, you’ll learn how to build multilayer neural networks with TensorFlow. Adding a hidden layer to a network allows it to model more complex functions. Also, using a non-linear activation function on the hidden layer lets it model non-linear functions.
We shall learn about ReLU, a non-linear function, or rectified linear unit. The ReLU function is 0 for negative inputs and x for all inputs x>0.
Next, you’ll see how a ReLU hidden layer is implemented in TensorFlow.
Quiz: TensorFlow ReLUs
|
|
Deep Neural Network in TensorFlow
Deep Neural Network in TensorFlow
You’ve seen how to build a logistic classifier using TensorFlow. Now you’re going to see how to use the logistic classifier to build a deep neural network.
Step by Step
In the following walkthrough, we’ll step through TensorFlow code written to classify the letters in the MNIST database. If you would like to run the network on your computer, the file is provided here. You can find this and many more examples of TensorFlow at Aymeric Damien’s GitHub repository.
Code
TensorFlow MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
You’ll use the MNIST dataset provided by TensorFlow, which batches and One-Hot encodes the data for you.
Learning Parameters
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128 # Decrease batch size if you don't have enough memory
display_step = 1
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
The focus here is on the architecture of multilayer neural networks, not parameter tuning, so here we’ll just give you the learning parameters.
Hidden Layer Parameters
n_hidden_layer = 256 # layer number of features
The variable n_hidden_layer
determines the size of the hidden layer in the neural network. This is also known as the width of a layer.
Weights and Biases
# Store layers weight & bias
weights = {
'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
Deep neural networks use multiple layers with each layer requiring it’s own weight and bias. The 'hidden_layer'
weight and bias is for the hidden layer. The 'out'
weight and bias is for the output layer. If the neural network were deeper, there would be weights and biases for each additional layer.
Input
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])
x_flat = tf.reshape(x, [-1, n_input])
The MNIST data is made up of 28px by 28px images with a single channel. The tf.reshape()
function above reshapes the 28px by 28px matrices in x into row vectors of 784px.
Multilayer Perceptron
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
You’ve seen the linear function tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
before, also known as xw + b
. Combining linear functions together using a ReLU will give you a two layer network.
Optimizer
# Define loss and optimizer
cost = tf.reduce_mean(\
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
.minimize(cost)
This is the same optimization technique used in the Intro to TensorFLow lab.
Session
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
The MNIST library in TensorFlow provides the ability to receive the dataset in batches. Calling the mnist.train.next_batch()
function returns a subset of the training data.
Deeper Neural Network
That’s it! Going from one layer to two is easy. Adding more layers to the network allows you to solve more complicated problems. In the next video, you’ll see how changing the number of layers can affect your network.
Training a Deep Learning Network
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/CsB7yUtMJyk.mp4
Save and Restore TensorFlow Models
Save and Restore TensorFlow Models
Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!
Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver
. This class provides the functionality to save any tf.Variable
to your file system.
Saving Variables
Let’s start with a simple example of saving weights
and bias
Tensors. For the first example you’ll just save two variables. Later examples will save all the weights in a practical model.
import tensorflow as tf
# The file path to save the data
save_file = './model.ckpt'
# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()
with tf.Session() as sess:
# Initialize all the Variables
sess.run(tf.global_variables_initializer())
# Show the values of weights and bias
print('Weights:')
print(sess.run(weights))
print('Bias:')
print(sess.run(bias))
# Save the model
saver.save(sess, save_file)
Weights:
[[-0.97990924 1.03016174 0.74119264]
[-0.82581609 -0.07361362 -0.86653847]]
Bias:
[ 1.62978125 -0.37812829 0.64723819]
The Tensors weights
and bias
are set to random values using the tf.truncated_normal()
function. The values are then saved to the save_file
location, “model.ckpt”, using the tf.train.Saver.save()
function. (The “.ckpt” extension stands for “checkpoint”.)
If you’re using TensorFlow 0.11.0RC1 or newer, a file called “model.ckpt.meta” will also be created. This file contains the TensorFlow graph.
Loading Variables
Now that the Tensor Variables are saved, let’s load them back into a new model.
# Remove the previous weights and bias
tf.reset_default_graph()
# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()
with tf.Session() as sess:
# Load the weights and bias
saver.restore(sess, save_file)
# Show the values of weights and bias
print('Weight:')
print(sess.run(weights))
print('Bias:')
print(sess.run(bias))
Weights:
[[-0.97990924 1.03016174 0.74119264]
[-0.82581609 -0.07361362 -0.86653847]]
Bias:
[ 1.62978125 -0.37812829 0.64723819]
You’ll notice you still need to create the weights
and bias
Tensors in Python. The tf.train.Saver.restore()
function loads the saved data into weights
and bias
.
Since tf.train.Saver.restore()
sets all the TensorFlow Variables, you don’t need to call tf.global_variables_initializer()
.
Save a Trained Model
Let’s see how to train a model and save its weights.
First start with a model:
# Remove previous Tensors and Operations
tf.reset_default_graph()
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Import MNIST data
mnist = input_data.read_data_sets('.', one_hot=True)
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Define loss and optimizer
cost = tf.reduce_mean(\
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
.minimize(cost)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Let’s train that model, then save the weights:
import math
save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Training cycle
for epoch in range(n_epochs):
total_batch = math.ceil(mnist.train.num_examples / batch_size)
# Loop over all batches
for i in range(total_batch):
batch_features, batch_labels = mnist.train.next_batch(batch_size)
sess.run(
optimizer,
feed_dict={features: batch_features, labels: batch_labels})
# Print status for every 10 epochs
if epoch % 10 == 0:
valid_accuracy = sess.run(
accuracy,
feed_dict={
features: mnist.validation.images,
labels: mnist.validation.labels})
print('Epoch {:<3} - Validation Accuracy: {}'.format(
epoch,
valid_accuracy))
# Save the model
saver.save(sess, save_file)
print('Trained Model Saved.')
Epoch 0 - Validation Accuracy: 0.06859999895095825
Epoch 10 - Validation Accuracy: 0.20239999890327454
Epoch 20 - Validation Accuracy: 0.36980000138282776
Epoch 30 - Validation Accuracy: 0.48820000886917114
Epoch 40 - Validation Accuracy: 0.5601999759674072
Epoch 50 - Validation Accuracy: 0.6097999811172485
Epoch 60 - Validation Accuracy: 0.6425999999046326
Epoch 70 - Validation Accuracy: 0.6733999848365784
Epoch 80 - Validation Accuracy: 0.6916000247001648
Epoch 90 - Validation Accuracy: 0.7113999724388123
Trained Model Saved.
Load a Trained Model
Let’s load the weights and bias from memory, then check the test accuracy.
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
saver.restore(sess, save_file)
test_accuracy = sess.run(
accuracy,
feed_dict={features: mnist.test.images, labels: mnist.test.labels})
print('Test Accuracy: {}'.format(test_accuracy))
Test Accuracy: 0.7229999899864197
That’s it! You now know how to save and load a trained model in TensorFlow. Let’s look at loading weights and biases into modified models in the next section.
Finetuning
Loading the Weights and Biases into a New Model
Sometimes you might want to adjust, or “finetune” a model that you have already trained and saved.
However, loading saved Variables directly into a modified model can generate errors. Let’s go over how to avoid these problems.
Naming Error
TensorFlow uses a string identifier for Tensors and Operations called name
. If a name is not given, TensorFlow will create one automatically. TensorFlow will give the first node the name <Type>_<number>
for the subsequent nodes. Let’s see how this can affect loading a model with a different order of weights
and bias
:
import tensorflow as tf
# Remove the previous weights and bias
tf.reset_default_graph()
save_file = 'model.ckpt'
# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
saver = tf.train.Saver()
# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, save_file)
# Remove the previous weights and bias
tf.reset_default_graph()
# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))
saver = tf.train.Saver()
# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))
with tf.Session() as sess:
# Load the weights and bias - ERROR
saver.restore(sess, save_file)
The code above prints out the following:
Save Weights: Variable:0
Save Bias: Variable_1:0
Load Weights: Variable_1:0
Load Bias: Variable:0
…
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match.
…
You’ll notice that the name
properties for weights
and bias
are different than when you saved the model. This is why the code produces the “Assign requires shapes of both tensors to match” error. The code saver.restore(sess, save_file)
is trying to load weight data into bias
and bias data into weights
.
Instead of letting TensorFlow set the name
property, let’s set it manually:
import tensorflow as tf
tf.reset_default_graph()
save_file = 'model.ckpt'
# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
saver = tf.train.Saver()
# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, save_file)
# Remove the previous weights and bias
tf.reset_default_graph()
# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')
saver = tf.train.Saver()
# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))
with tf.Session() as sess:
# Load the weights and bias - No Error
saver.restore(sess, save_file)
print('Loaded Weights and Bias successfully.')
Save Weights: weights_0:0
Save Bias: bias_0:0
Load Weights: weights_0:0
Load Bias: bias_0:0
Loaded Weights and Bias successfully.
That worked! The Tensor names match and the data loaded correctly.
Regularization Intro
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/pECnr-5F3_Q.mp4
Regularization
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/QcJBhbuCl5g.mp4
Regularization Quiz
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/E0eEW6V0_sA.mp4
Dropout
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6DcImJS8uV8.mp4
Dropout Pt. 2
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8nG8zzJMbZw.mp4
Quiz: TensorFlow Dropout
TensorFlow Dropout
https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (artificial neurons) from the network, along with all of those units’ incoming and outgoing connections. Figure 1 illustrates how dropout works.
TensorFlow provides the tf.nn.dropout()
function, which you can use to implement dropout.
Let’s look at an example of how to use tf.nn.dropout()
.
keep_prob = tf.placeholder(tf.float32) # probability to keep units
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
The code above illustrates how to apply dropout to a neural network.
The tf.nn.dropout()
function takes in two parameters:
hidden_layer
: the tensor to which you would like to apply dropoutkeep_prob
: the probability of keeping (i.e. not dropping) any given unit
keep_prob
allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout()
multiplies all units that are kept (i.e. not dropped) by 1/keep_prob
.
During training, a good starting value for keep_prob
is 0.5
.
During testing, use a keep_prob
value of 1.0
to keep all units and maximize the power of the model.
Quiz 1
Take a look at the code snippet below. Do you see what’s wrong?
There’s nothing wrong with the syntax, however the test accuracy is extremely low.
...
keep_prob = tf.placeholder(tf.float32) # probability to keep units
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
...
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch_i in range(epochs):
for batch_i in range(batches):
....
sess.run(optimizer, feed_dict={
features: batch_features,
labels: batch_labels,
keep_prob: 0.5})
validation_accuracy = sess.run(accuracy, feed_dict={
features: test_features,
labels: test_labels,
keep_prob: 0.5})
QUESTION 1 OF 2
What’s wrong with the above code?
Dropout doesn’t work with batching.
The keep_prob value of 0.5 is too low.
(correct)There shouldn’t be a value passed to keep_prob when testing for accuracy.
keep_prob should be set to 1.0 when evaluating validation accuracy.
Quiz 2
This quiz will be starting with the code from the ReLU Quiz and applying a dropout layer. Build a model with a ReLU layer and dropout layer using the keep_prob
placeholder to pass in a probability of 0.5. Print the logits from the model.
Note: Output will be different every time the code is run. This is caused by dropout randomizing the units it drops.
“solution.py”
# Quiz Solution
# Note: You can't run code in this tab
import tensorflow as tf
hidden_layer_weights = [
[0.1, 0.2, 0.4],
[0.4, 0.6, 0.6],
[0.5, 0.9, 0.1],
[0.8, 0.2, 0.8]]
out_weights = [
[0.1, 0.6],
[0.2, 0.1],
[0.7, 0.9]]
# Weights and biases
weights = [
tf.Variable(hidden_layer_weights),
tf.Variable(out_weights)]
biases = [
tf.Variable(tf.zeros(3)),
tf.Variable(tf.zeros(2))]
# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])
# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
# TODO: Print logits from a session
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(logits, feed_dict={keep_prob: 0.5}))
[[ 1.10000002 6.60000038]
[ 0.30800003 0.7700001 ]
[ 9.56000042 4.78000021]]
CONVOLUTIONAL NEURAL NETWORKS
Intro To CNNs
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B61jxZ4rkMs.mp4
Color
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/BdQccpMwk80.mp4
QUIZ QUESTION
What would be easier for your classifier to learn?
R, G, B
(correct)(R + G + B) / 3
Statistical Invariance
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0Hr5YwUUhr0.mp4
Convolutional Networks
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ISHGyvsT0QY.mp4
Intuition
Intuition
Let’s develop better intuition for how Convolutional Neural Networks (CNN) work. We’ll examine how humans classify images, and then see how CNNs use similar approaches.
Let’s say we wanted to classify the following image of a dog as a Golden Retriever.
As humans, how do we do this?
One thing we do is that we identify certain parts of the dog, such as the nose, the eyes, and the fur. We essentially break up the image into smaller pieces, recognize the smaller pieces, and then combine those pieces to get an idea of the overall dog.
In this case, we might break down the image into a combination of the following:
- A nose
- Two eyes
- Golden fur
These pieces can be seen below:
Going One Step Further
But let’s take this one step further. How do we determine what exactly a nose is? A Golden Retriever nose can be seen as an oval with two black holes inside it. Thus, one way of classifying a Retriever’s nose is to to break it up into smaller pieces and look for black holes (nostrils) and curves that define an oval as shown below.
Broadly speaking, this is what a CNN learns to do. It learns to recognize basic lines and curves, then shapes and blobs, and then increasingly complex objects within the image. Finally, the CNN classifies the image by combining the larger, more complex objects.
In our case, the levels in the hierarchy are:
- Simple shapes, like ovals and dark circles
- Complex objects (combinations of simple shapes), like eyes, nose, and fur
- The dog as a whole (a combination of complex objects)
With deep learning, we don’t actually program the CNN to recognize these specific features. Rather, the CNN learns on its own to recognize such objects through forward propagation and backpropagation!
It’s amazing how well a CNN can learn to classify images, even though we never program the CNN with information about specific features to look for.
A CNN might have several layers, and each layer might capture a different level in the hierarchy of objects. The first layer is the lowest level in the hierarchy, where the CNN generally classifies small parts of the image into simple shapes like horizontal and vertical lines and simple blobs of colors. The subsequent layers tend to be higher levels in the hierarchy and generally classify more complex ideas like shapes (combinations of lines), and eventually full objects like dogs.
Once again, the CNN learns all of this on its own. We don’t ever have to tell the CNN to go looking for lines or curves or noses or fur. The CNN just learns from the training set and discovers which characteristics of a Golden Retriever are worth looking for.
That’s a good start! Hopefully you’ve developed some intuition about how CNNs work.
Next, let’s look at some implementation details.
Filters
Breaking up an Image
The first step for a CNN is to break up the image into smaller pieces. We do this by selecting a width and height that defines a filter.
The filter looks at small pieces, or patches, of the image. These patches are the same size as the filter.
We then simply slide this filter horizontally or vertically to focus on a different piece of the image.
The amount by which the filter slides is referred to as the ‘stride’. The stride is a hyperparameter which you, the engineer, can tune. Increasing the stride reduces the size of your model by reducing the number of total patches each layer observes. However, this usually comes with a reduction in accuracy.
Let’s look at an example. In this zoomed in image of the dog, we first start with the patch outlined in red. The width and height of our filter define the size of this square.
We then move the square over to the right by a given stride (2 in this case) to get another patch.
What’s important here is that we are grouping together adjacent pixels and treating them as a collective.
In a normal, non-convolutional neural network, we would have ignored this adjacency. In a normal network, we would have connected every pixel in the input image to a neuron in the next layer. In doing so, we would not have taken advantage of the fact that pixels in an image are close together for a reason and have special meaning.
By taking advantage of this local structure, our CNN learns to classify local patterns, like shapes and objects, in an image.
Filter Depth
It’s common to have more than one filter. Different filters pick up different qualities of a patch. For example, one filter might look for a particular color, while another might look for a kind of object of a specific shape. The amount of filters in a convolutional layer is called the filter depth.
How many neurons does each patch connect to?
That’s dependent on our filter depth. If we have a depth of k
, we connect each patch of pixels to k
neurons in the next layer. This gives us the height of k
in the next layer, as shown below. In practice, k
is a hyperparameter we tune, and most CNNs tend to pick the same starting values.
But why connect a single patch to multiple neurons in the next layer? Isn’t one neuron good enough?
Multiple neurons can be useful because a patch can have multiple interesting characteristics that we want to capture.
For example, one patch might include some white teeth, some blonde whiskers, and part of a red tongue. In that case, we might want a filter depth of at least three - one for each of teeth, whiskers, and tongue.
Having multiple neurons for a given patch ensures that our CNN can learn to capture whatever characteristics the CNN learns are important.
Remember that the CNN isn’t “programmed” to look for certain characteristics. Rather, it learns on its own which characteristics to notice.
Feature Map Sizes
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lp1NrLZnCUM.mp4
What are the width, height and depth for padding = ‘same’, stride = 1?
Enter your answers in the format “width, height, depth”
28,28,8What are the width, height and depth for padding = ‘valid’, stride = 1?
Enter your answers in the format “width, height, depth”
26,26,8What are the width, height and depth for padding = ‘valid’, stride = 2?
Enter your answers in the format “width, height, depth”
13,13,8
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/W4xtf8LTz1c.mp4
Convolutions continued
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/utOv-BKI_vo.mp4
Convolutions Cont.
Note, a “Fully Connected” layer is a standard, non convolutional layer, where all inputs are connected to all output neurons. This is also referred to as a “dense” layer, and is what we used in the previous two lessons.
Parameters
Parameter Sharing
The weights, w, are shared across patches for a given layer in a CNN to detect the cat above regardless of where in the image it is located.
When we are trying to classify a picture of a cat, we don’t care where in the image a cat is. If it’s in the top left or the bottom right, it’s still a cat in our eyes. We would like our CNNs to also possess this ability known as translation invariance. How can we achieve this?
As we saw earlier, the classification of a given patch in an image is determined by the weights and biases corresponding to that patch.
If we want a cat that’s in the top left patch to be classified in the same way as a cat in the bottom right patch, we need the weights and biases corresponding to those patches to be the same, so that they are classified the same way.
This is exactly what we do in CNNs. The weights and biases we learn for a given output layer are shared across all patches in a given input layer. Note that as we increase the depth of our filter, the number of weights and biases we have to learn still increases, as the weights aren’t shared across the output channels.
There’s an additional benefit to sharing our parameters. If we did not reuse the same weights across all patches, we would have to learn new parameters for every single patch and hidden layer neuron pair. This does not scale well, especially for higher fidelity images. Thus, sharing parameters not only helps us with translation invariance, but also gives us a smaller, more scalable model.
Padding
A
5x5
grid with a3x3
filter. Source: Andrej Karpathy.
Let’s say we have a 5x5
grid (as shown above) and a filter of size 3x3
with a stride of 1. What’s the width and height of the next layer? We see that we can fit at most three patches in each direction, giving us a dimension of 3x3
in our next layer. As we can see, the width and height of each subsequent layer decreases in such a scheme.
In an ideal world, we’d be able to maintain the same width and height across layers so that we can continue to add layers without worrying about the dimensionality shrinking and so that we have consistency. How might we achieve this? One way is to simply add a border of 0s to our original 5x5 image. You can see what this looks like in the below image.
The same grid with
0
padding. Source: Andrej Karpathy.
This would expand our original image to a 7x7
. With this, we now see how our next layer’s size is again a 5x5
, keeping our dimensionality consistent.
Dimensionality
From what we’ve learned so far, how can we calculate the number of neurons of each layer in our CNN?
Given:
- our input layer has a width of
W
and a height ofH
- our convolutional layer has a filter size
F
- we have a stride of
S
- a padding of
P
- and a filter depth of
K
,
the following formula gives us the width of the next layer: W_out = (W−F+2P)/S+1
.
The output height would be H_out = (H-F+2P)/S + 1
.
And the output depth would be equal to the filter depth D_out = K
.
The output volume would be W_out * H_out * D_out
.
Knowing the dimensionality of each additional layer helps us understand how large our model is and how our decisions around filter size and stride affect the size of our network.
Quiz: Convolution Output Shap
Introduction
For the next few quizzes we’ll test your understanding of the dimensions in CNNs. Understanding dimensions will help you make accurate tradeoffs between model size and performance. As you’ll see, some parameters have a much bigger impact on model size than others.
Setup
H = height, W = width, D = depth
- We have an input of shape 32x32x3 (HxWxD)
- 20 filters of shape 8x8x3 (HxWxD)
- A stride of 2 for both the height and width (S)
- Valid padding of size 1 ( P )
Recall the formula for calculating the new height or width:
new_height = (input_height - filter_height + 2 * P)/S + 1
new_width = (input_width - filter_width + 2 * P)/S + 1
Convolutional Layer Output Shape
What’s the shape of the output?The answer format is HxWxD, so if you think the new height is 9, new width is 9, and new depth is 5, then type 9x9x5.
14x14x20
Solution: Convolution Output
Solution
The answer is 14x14x20.
We can get the new height and width with the formula resulting in:
(32 - 8 + 2 * 1)/2 + 1 = 14
(32 - 8 + 2 * 1)/2 + 1 = 14
The new depth is equal to the number of filters, which is 20.
This would correspond to the following code:
input = tf.placeholder(tf.float32, (None, 32, 32, 3))
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias
Note the output shape of conv
will be [1, 13, 13, 20]. It’s 4D to account for batch size, but more importantly, it’s not [1, 14, 14, 20]. This is because the padding algorithm TensorFlow uses is not exactly the same as the one above. An alternative algorithm is to switch padding
from 'VALID'
to SAME
which would result in an output shape of [1, 16, 16, 20]. If you’re curious how padding works in TensorFlow, read this document.
Quiz: Number of Parameters
We’re now going to calculate the number of parameters of the convolutional layer. The answer from the last quiz will come into play here!
Being able to calculate the number of parameters in a neural network is useful since we want to have control over how much memory a neural network uses.
Setup
H = height, W = width, D = depth
- We have an input of shape 32x32x3 (HxWxD)
- 20 filters of shape 8x8x3 (HxWxD)
- A stride of 2 for both the height and width (S)
- Valid padding of size 1 ( P )
Output Layer
- 14x14x20 (HxWxD)
Hint
Without parameter sharing, each neuron in the output layer must connect to each neuron in the filter. In addition, each neuron in the output layer must also connect to a single bias neuron.
Solution: Number of Parameters
Solution
There are 756560
total parameters. That’s a HUGE amount! Here’s how we calculate it:
(8 * 8 * 3 + 1) * (14 * 14 * 20) = 756560
8 * 8 * 3
is the number of weights, we add 1
for the bias. Remember, each weight is assigned to every single part of the output (14 * 14 * 20)
. So we multiply these two numbers together and we get the final answer.
Quiz: Parameter Sharing
Now we’d like you to calculate the number of parameters in the convolutional layer, if every neuron in the output layer shares its parameters with every other neuron in its same channel.
This is the number of parameters actually used in a convolution layer (tf.nn.conv2d()
).
Setup
H = height, W = width, D = depth
- We have an input of shape 32x32x3 (HxWxD)
- 20 filters of shape 8x8x3 (HxWxD)
- A stride of 2 for both the height and width (S)
- Zero padding of size 1 (P)
Output Layer
- 14x14x20 (HxWxD)
Hint
With parameter sharing, each neuron in an output channel shares its weights with every other neuron in that channel. So the number of parameters is equal to the number of neurons in the filter, plus a bias neuron, all multiplied by the number of channels in the output layer.
Convolution Layer Parameters 2
How many parameters does the convolution layer have (with parameter sharing)?
3860
Solution: Parameter Sharing
Solution
There are 3860
total parameters. That’s 196 times fewer parameters! Here’s how the answer is calculated:
(8 * 8 * 3 + 1) * 20 = 3840 + 20 = 3860
That’s 3840
weights and 20
biases. This should look similar to the answer from the previous quiz. The difference being it’s just 20
instead of (14 * 14 * 20)
. Remember, with weight sharing we use the same filter for an entire depth slice. Because of this we can get rid of 14 * 14
and be left with only 20
.
Visualizing CNNs
Visualizing CNNs
Let’s look at an example CNN to see how it works in action.
The CNN we will look at is trained on ImageNet as described in this paper by Zeiler and Fergus. In the images below (from the same paper), we’ll see what each layer in this network detects and see how each layer detects more and more complex ideas.
Layer 1
Example patterns that cause activations in the first layer of the network. These range from simple diagonal lines (top left) to green blobs (bottom middle).
The images above are from Matthew Zeiler and Rob Fergus’ deep visualization toolbox, which lets us visualize what each layer in a CNN focuses on.
Each image in the above grid represents a pattern that causes the neurons in the first layer to activate - in other words, they are patterns that the first layer recognizes. The top left image shows a -45 degree line, while the middle top square shows a +45 degree line. These squares are shown below again for reference.
As visualized here, the first layer of the CNN can recognize -45 degree lines.
The first layer of the CNN is also able to recognize +45 degree lines, like the one above.
Let’s now see some example images that cause such activations. The below grid of images all activated the -45 degree line. Notice how they are all selected despite the fact that they have different colors, gradients, and patterns.
Example patches that activate the -45 degree line detector in the first layer.
So, the first layer of our CNN clearly picks out very simple shapes and patterns like lines and blobs(斑点).
Layer 2
A visualization of the second layer in the CNN. Notice how we are picking up more complex ideas like circles and stripes. The gray grid on the left represents how this layer of the CNN activates (or “what it sees”) based on the corresponding images from the grid on the right.
The second layer of the CNN captures complex ideas.
As you see in the image above, the second layer of the CNN recognizes circles (second row, second column), stripes (first row, second column), and rectangles (bottom right).
The CNN learns to do this on its own. There is no special instruction for the CNN to focus on more complex objects in deeper layers. That’s just how it normally works out when you feed training data into a CNN.
Layer 3
A visualization of the third layer in the CNN. The gray grid on the left represents how this layer of the CNN activates (or “what it sees”) based on the corresponding images from the grid on the right.
The third layer picks out complex combinations of features from the second layer. These include things like grids, and honeycombs (top left), wheels (second row, second column), and even faces (third row, third column).
Layer 5
A visualization of the fifth and final layer of the CNN. The gray grid on the left represents how this layer of the CNN activates (or “what it sees”) based on the corresponding images from the grid on the right.
We’ll skip layer 4, which continues this progression, and jump right to the fifth and final layer of this CNN.
The last layer picks out the highest order ideas that we care about for classification, like dog faces, bird faces, and bicycles.
On to TensorFlow
This concludes our high-level discussion of Convolutional Neural Networks.
Next you’ll practice actually building these networks in TensorFlow.
TensorFlow Convolution Layer
TensorFlow Convolution Layer
Let’s examine how to implement a CNN in TensorFlow.
TensorFlow provides the tf.nn.conv2d()
and tf.nn.bias_add()
functions to create your own convolutional layers.
# Output depth
k_output = 64
# Image Properties
image_width = 10
image_height = 10
color_channels = 3
# Convolution filter
filter_size_width = 5
filter_size_height = 5
# Input/Image
input = tf.placeholder(
tf.float32,
shape=[None, image_height, image_width, color_channels])
# Weight and bias
weight = tf.Variable(tf.truncated_normal(
[filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))
# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)
The code above uses the tf.nn.conv2d()
function to compute the convolution with weight
as the filter and [1, 2, 2, 1]
for the strides. TensorFlow uses a stride for each input
dimension,[batch, input_height, input_width, input_channels]
. We are generally always going to set the stride for batch
and input_channels
(i.e. the first and fourth element in the strides
array) to be 1
.
You’ll focus on changing input_height
and input_width
while setting batch
and input_channels
to 1. The input_height
and input_width
strides are for striding the filter over input
. This example code uses a stride of 2 with 5x5 filter over input
.
The tf.nn.bias_add()
function adds a 1-d bias to the last dimension in a matrix.
Explore The Design Space
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/FG7M9tWH2nQ.mp4
TensorFlow Max Pooling
TensorFlow Max Pooling
By Aphex34 (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
The image above is an example of max pooling with a 2x2 filter and stride of 2. The four 2x2 colors represent each time the filter was applied to find the maximum value.
For example, [[1, 0], [4, 6]]
becomes 6
, because 6
is the maximum value in this set. Similarly, [[2, 3], [6, 8]]
becomes 8
.
Conceptually, the benefit of the max pooling operation is to reduce the size of the input, and allow the neural network to focus on only the most important elements. Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values.
TensorFlow provides the tf.nn.max_pool()
function to apply max pooling to your convolutional layers.
...
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
conv_layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
The tf.nn.max_pool()
function performs max pooling with the ksize parameter as the size of the filter and the strides parameter as the length of the stride. 2x2 filters with a stride of 2x2 are common in practice.
The ksize and strides parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). For both ksize and strides, the batch and channel dimensions are typically set to 1.
Quiz: Pooling Intuition
The next few quizzes will test your understanding of pooling layers.
QUIZ QUESTION
A pooling layer is generally used to …
Increase the size of the output
(correct)Decrease the size of the output
(correct)Prevent overfittingGain information
Solution: Pooling Intuition
Solution
The correct answer is decrease the size of the output and prevent overfitting. Preventing overfitting is a consequence of reducing the output size, which in turn, reduces the number of parameters in future layers.
Recently, pooling layers have fallen out of favor. Some reasons are:
- Recent datasets are so big and complex we’re more concerned about underfitting.
- Dropout is a much better regularizer.
- Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the largest of n numbers, thereby disregarding n-1 numbers completely.
Quiz: Pooling Mechanics
Setup
H = height, W = width, D = depth
We have an input of shape 4x4x5 (HxWxD)
Filter of shape 2x2 (HxW)
A stride of 2 for both the height and width (S)
Recall the formula for calculating the new height or width:
new_height = (input_height - filter_height)/S + 1
new_width = (input_width - filter_width)/S + 1
NOTE: For a pooling layer the output depth is the same as the input depth. Additionally, the pooling operation is applied individually for each depth slice.
The image below gives an example of how a max pooling layer works. In this case, the max pooling filter has a shape of 2x2. As the max pooling filter slides across the input layer, the filter will output the maximum value of the 2x2 square.
Pooling Layer Output Shape
What’s the shape of the output? Format is HxWxD.
2x2x5
Solution: Pooling Mechanics
Solution
The answer is 2x2x5. Here’s how it’s calculated using the formula:
(4 - 2)/2 + 1 = 2
(4 - 2)/2 + 1 = 2
The depth stays the same.
Here’s the corresponding code:
input = tf.placeholder(tf.float32, (None, 4, 4, 5))
filter_shape = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
pool = tf.nn.max_pool(input, filter_shape, strides, padding)
The output shape of pool
will be [1, 2, 2, 5], even if padding
is changed to 'SAME'
.
Quiz: Pooling Practice
Great, now let’s practice doing some pooling operations manually.
Max Pooling
What’s the result of a max pooling operation on the input:[[[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]]]
Assume the filter is 2x2 and the stride is 2 for both height and width. The output shape is 2x2x1.The answering format will be 4 numbers, each separated by a comma, such as: 1,2,3,4.
Work from the top left to the bottom right
Enter your response here
SUBMIT
NEXT
Solution: Pooling Practice
Solution
The correct answer is 2.5,10,15,6. We start with the four numbers in the top left corner. Then we work left-to-right and top-to-bottom, moving 2 units each time.
max(0, 1, 2, 2.5) = 2.5
max(0.5, 10, 1, -8) = 10
max(4, 0, 15, 1) = 15
max(5, 6, 2, 3) = 6
Quiz: Average Pooling
Mean Pooling
What’s the result of a average (or mean) pooling?[[[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]]]
Assume the filter is 2x2 and the stride is 2 for both height and width. The output shape is 2x2x1.The answering format will be 4 numbers, each separated by a comma, such as: 1,2,3,4.
Answer to 3 decimal places. Work from the top left to the bottom right
Solution: Average Pooling
Solution
The correct answer is 1.375,0.875,5,4. We start with the four numbers in the top left corner. Then we work left-to-right and top-to-bottom, moving 2 units each time.
mean(0, 1, 2, 2.5) = 1.375
mean(0.5, 10, 1, -8) = 0.875
mean(4, 0, 15, 1) = 5
mean(5, 6, 2, 3) = 4
1x1 Convolutions
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Zmzgerm6SjA.mp4
Inception Module
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/SlTm03bEOxA.mp4
Convolutional Network in TensorFlow
Convolutional Network in TensorFlow
It’s time to walk through an example Convolutional Neural Network (CNN) in TensorFlow.
The structure of this network follows the classic structure of CNNs, which is a mix of convolutional layers and max pooling, followed by fully-connected layers.
The code you’ll be looking at is similar to what you saw in the segment on Deep Neural Network in TensorFlow, except we restructured the architecture of this network as a CNN.
Just like in that segment, here you’ll study the line-by-line breakdown of the code. If you want, you can even download the code and run it yourself.
Thanks to Aymeric Damien for providing the original TensorFlow model on which this segment is based.
Time to dive in!
Dataset
You’ve seen this section of code from previous lessons. Here we’re importing the MNIST dataset and using a convenient TensorFlow function to batch, scale, and One-Hot encode the data.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
import tensorflow as tf
# Parameters
learning_rate = 0.00001
epochs = 10
batch_size = 128
# Number of samples to calculate validation and accuracy
# Decrease this if you're running out of memory to calculate accuracy
test_valid_size = 256
# Network Parameters
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units
Weights and Biases
# Store layers weight & bias
weights = {
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
'out': tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {
'bc1': tf.Variable(tf.random_normal([32])),
'bc2': tf.Variable(tf.random_normal([64])),
'bd1': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([n_classes]))}
Convolutions
Convolution with 3×3 Filter. Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
The above is an example of a convolution with a 3x3 filter and a stride of 1 being applied to data with a range of 0 to 1. The convolution for each 3x3 section is calculated against the weight, [[1, 0, 1], [0, 1, 0], [1, 0, 1]]
, then a bias is added to create the convolved feature on the right. In this case, the bias is zero. In TensorFlow, this is all done using tf.nn.conv2d()
and tf.nn.bias_add()
.
def conv2d(x, W, b, strides=1):
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=’SAME’)
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
The tf.nn.conv2d()
function computes the convolution against weight W
as shown above.
In TensorFlow, strides
is an array of 4 elements; the first element in this array indicates the stride for batch and last element indicates stride for features. It’s good practice to remove the batches or features you want to skip from the data set rather than use a stride to skip them. You can always set the first and last element to 1 in strides
in order to use all batches and features.
The middle two elements are the strides for height and width respectively. I’ve mentioned stride as one number because you usually have a square stride where height = width
. When someone says they are using a stride of 3, they usually mean tf.nn.conv2d(x, W, strides=[1, 3, 3, 1])
.
To make life easier, the code is using tf.nn.bias_add()
to add the bias. Using tf.add()
doesn’t work when the tensors aren’t the same shape.
Max Pooling
Max Pooling with 2x2 filter and stride of 2. Source: http://cs231n.github.io/convolutional-networks/
The above is an example of max pooling with a 2x2 filter and stride of 2. The left square is the input and the right square is the output. The four 2x2 colors in input represents each time the filter was applied to create the max on the right side. For example, [[1, 1], [5, 6]]
becomes 6 and [[3, 2], [1, 2]]
becomes 3.
def maxpool2d(x, k=2):
return tf.nn.max_pool(
x,
ksize=[1, k, k, 1],
strides=[1, k, k, 1],
padding=’SAME’)
The tf.nn.max_pool()
function does exactly what you would expect, it performs max pooling with the ksize
parameter as the size of the filter.
Model
Image from Explore The Design Space video
In the code below, we’re creating 3 layers alternating between convolutions and max pooling followed by a fully connected and output layer. The transformation of each layer to new dimensions are shown in the comments. For example, the first layer shapes the images from 28x28x1 to 28x28x32 in the convolution step. Then next step applies max pooling, turning each sample into 14x14x32. All the layers are applied from conv1
to output
, producing 10 class predictions.
def conv_net(x, weights, biases, dropout):
# Layer 1 - 28*28*1 to 14*14*32
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
conv1 = maxpool2d(conv1, k=2)
# Layer 2 - 14*14*32 to 7*7*64
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
conv2 = maxpool2d(conv2, k=2)
# Fully connected layer - 7*7*64 to 1024
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(fc1, dropout)
# Output Layer - class prediction - 1024 to 10
out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
return out
Session
Now let’s run it!
# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)
# Model
logits = conv_net(x, weights, biases, keep_prob)
# Define loss and optimizer
cost = tf.reduce_mean(\
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
.minimize(cost)
# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf. global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
for batch in range(mnist.train.num_examples//batch_size):
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(optimizer, feed_dict={
x: batch_x,
y: batch_y,
keep_prob: dropout})
# Calculate batch loss and accuracy
loss = sess.run(cost, feed_dict={
x: batch_x,
y: batch_y,
keep_prob: 1.})
valid_acc = sess.run(accuracy, feed_dict={
x: mnist.validation.images[:test_valid_size],
y: mnist.validation.labels[:test_valid_size],
keep_prob: 1.})
print('Epoch {:>2}, Batch {:>3} -'
'Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(
epoch + 1,
batch + 1,
loss,
valid_acc))
# Calculate Test Accuracy
test_acc = sess.run(accuracy, feed_dict={
x: mnist.test.images[:test_valid_size],
y: mnist.test.labels[:test_valid_size],
keep_prob: 1.})
print('Testing Accuracy: {}'.format(test_acc))
That’s it! That is a CNN in TensorFlow. Now that you’ve seen a CNN in TensorFlow, let’s see if you can apply it on your own!
TensorFlow Convolution Layer
Using Convolution Layers in TensorFlow
Let’s now apply what we’ve learned to build real CNNs in TensorFlow. In the below exercise, you’ll be asked to set up the dimensions of the Convolution filters, the weights, the biases. This is in many ways the trickiest part to using CNNs in TensorFlow. Once you have a sense of how to set up the dimensions of these attributes, applying CNNs will be far more straight forward.
Review
You should go over the TensorFlow documentation for 2D convolutions. Most of the documentation is straightforward, except perhaps the padding argument. The padding
might differ depending on whether you pass 'VALID'
or 'SAME'
.
Here are a few more things worth reviewing:
Introduction to TensorFlow -> TensorFlow Variables.
How to determine the dimensions of the output based on the input size and the filter size (shown below). You’ll use this to determine what the size of your filter should be.
new_height = (input_height - filter_height + 2 P)/S + 1
new_width = (input_width - filter_width + 2 P)/S + 1
Instructions
- Finish off each
TODO
in theconv2d
function. - Setup the
strides
,padding
and filter weight/bias (F_w
andF_b
) such that the output shape is(1, 2, 2, 3)
. Note that all of these except strides should be TensorFlow variables.
|
|
Solution: TensorFlow Convolution Layer
Solution
Here’s how I did it. NOTE: there’s more than 1 way to get the correct output shape. Your answer might differ from mine.
def conv2d(input):
# Filter (weights and bias)
F_W = tf.Variable(tf.truncated_normal((2, 2, 1, 3)))
F_b = tf.Variable(tf.zeros(3))
strides = [1, 2, 2, 1]
padding = 'VALID'
return tf.nn.conv2d(input, F_W, strides, padding) + F_b
I want to transform the input shape (1, 4, 4, 1) to (1, 2, 2, 3). I choose ‘VALID’ for the padding algorithm. I find it simpler to understand and it achieves the result I’m looking for.
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
Plugging in the values:
out_height = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2
out_width = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2
In order to change the depth from 1 to 3, I have to set the output depth of my filter appropriately:
F_W = tf.Variable(tf.truncated_normal((2, 2, 1, 3))) # (height, width, input_depth, output_depth)
F_b = tf.Variable(tf.zeros(3)) # (output_depth)
The input has a depth of 1, so I set that as the input_depth
of the filter.
TensorFlow Pooling Layer
Using Pooling Layers in TensorFlow
In the below exercise, you’ll be asked to set up the dimensions of the pooling filters, strides, as well as the appropriate padding. You should go over the TensorFlow documentation for tf.nn.max_pool()
. Padding works the same as it does for a convolution.
Instructions
Finish off each TODO
in the maxpool
function.
Setup the strides
, padding
and ksize
such that the output shape after pooling is (1, 2, 2, 1)
.
|
|
Solution: TensorFlow Pooling Layer
Solution
Here’s how I did it. NOTE: there’s more than 1 way to get the correct output shape. Your answer might differ from mine.
def maxpool(input):
ksize = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = ‘VALID’
return tf.nn.max_pool(input, ksize, strides, padding)
I want to transform the input shape (1, 4, 4, 1)
to (1, 2, 2, 1)
. I choose 'VALID'
for the padding algorithm. I find it simpler to understand and it achieves the result I’m looking for.
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
Plugging in the values:
out_height = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2
out_width = ceil(float(4 - 2 + 1) / float(2)) = ceil(1.5) = 2
The depth doesn’t change during a pooling operation so I don’t have to worry about that.
CNNs - Additional Resources
Additional Resources
There are many wonderful free resources that allow you to go into more depth around Convolutional Neural Networks. In this course, our goal is to give you just enough intuition to start applying this concept on real world problems so you have enough of an exposure to explore more on your own. We strongly encourage you to explore some of these resources more to reinforce your intuition and explore different ideas.
These are the resources we recommend in particular:
- Andrej Karpathy’s CS231n Stanford course on Convolutional Neural Networks.
- Michael Nielsen’s free book on Deep Learning.
- Goodfellow, Bengio, and Courville’s more advanced free book on Deep Learning.
DEEP LEARNING PROJECT
Project Details
Introduction to the Project
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/awEYy2Df3hg.mp4
Starting the Project
Starting the Project
For this assignment, you can find the image_classification
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we’ll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains 3 files:
image_classification.ipynb
: This is the main file where you will be performing your work on the project.- Two helper files, problem_unittests.py and helper.py
Submitting the Project
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Object Classification Program project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named image_recognition
for ease of access:
- The
image_classification.ipynb
notebook file with all questions answered and all code cells executed and displaying output along with the .html version of the notebook. - All helper files.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
PROJECT
Implement this project
- download this file from github
- open virtualbox
- copy
downloaded file
to my shared fileC:\Users\SSQ\virtualbox share
- type
sudo mount -t vboxsf virtualbox_share /mnt/
in ubuntu terminal - type
jupyter notebook image_classification.ipynb
in the right directory
error
ImportError: No module named request
(failed)try with anaconda3 in ubuntu
- download anaconda3 from this web
- type
./Anaconda3-4.3.1-Linux-x86_64.sh
in terminal to runsh
file Anaconda3 will now be installed into this location:/home/ssq/anaconda3
installation finished.
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/ssq/.bashrc ? [yes|no]
[no] >>>
You may wish to edit your .bashrc or prepend the Anaconda3 install location:$ export PATH=/home/ssq/anaconda3/bin:$PATH
Thank you for installing Anaconda3!
Share your notebooks and packages on Anaconda Cloud!
Sign up for free: https://anaconda.orgexport PATH=/home/ssq/anaconda3/bin:$PATH
in your.ipynb
locationconda create -n tensorflow
error:
ModuleNotFoundError: No module named 'tqdm'
method:conda install -c conda-forge tqdm
Package plan for installation in environment /home/ssq/anaconda3:The following NEW packages will be INSTALLED:
tqdm: 4.11.2-py36_0 conda-forge
The following packages will be SUPERCEDED by a higher-priority channel:
conda: 4.3.14-py36_0 --> 4.2.13-py36_0 conda-forge conda-env: 2.6.0-0 --> 2.6.0-0 conda-forge
Proceed ([y]/n)? y
- Anaconda installation
export PATH=/home/ssq/anaconda3/bin:$PATH
source activate tensorflow
- 123conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/conda config --set show_channel_urls yesconda install tensorflow
conda install -c conda-forge tensorflow
12conda install pandas matplotlib jupyter notebook scipy scikit-learnpip install tensorflowpip3 install --upgrade pip
(failed)pip3 install tensorflow in ubuntu
- from this web to create a new vb
- 安装增强 剪切板双向
- set the shared file from this blog and type
sudo mount -t vboxsf virtualbox_share /mnt/
- type
sudo apt install python3-pip
in terminal to installpython3
pip3 install tensorflow
python3
and test
(success)anaconda3 install in Win7 tensorflow
- download anaconda3 from this web
- 1234conda create -n tensorflow python=3.5activate tensorflowconda install pandas matplotlib jupyter notebook scipy scikit-learnpip install tensorflow
(tensorflow) C:\Users\SSQ>cd C:\Users\SSQ\virtualbox share\image-classification
(tensorflow) C:\Users\SSQ\virtualbox share\image-classification>jupyter notebook image_classification.ipynb
ModuleNotFoundError: No module named 'tqdm'
method:(tensorflow) C:\Users\SSQ\virtualbox share\image-classification>conda install tqdm
anaconda3 install in win7 tensorflow-gpu
- view this page and this blog
- cuda_8.0.61_windows in win7
- cudnn-8.0-windows7-x64-v6.0
- 1234conda create -n tensorflow-gpu python=3.5activate tensorflow-gpuconda install pandas matplotlib jupyter notebook scipy scikit-learnpip install tensorflow-gpu
Submission
Image Classification
Project Submission
Image Classification
Introduction
In this project, you’ll classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects. The dataset will need to be preprocessed, then train a convolutional neural network on all the samples. You’ll normalize the images, one-hot encode the labels, build a convolutional layer, max pool layer, and fully connected layer. At then end, you’ll see their predictions on the sample images.
Getting the project files
The project files can be found in our public GitHub repo, in the image-classification
folder. You can download the files from there, but it’s better to clone the repository to your computer
This way you can stay up to date with any changes we make by pulling the changes to your local repository with git pull
.
Submission
- Ensure you’ve passed all the unit tests in the notebook.
- Ensure you pass all points on the rubric.
- When you’re done with the project, please save the notebook as an HTML file. You can do this by going to the File menu in the notebook and choosing “Download as” > HTML. Ensure you submit both the Jupyter Notebook and it’s HTML version together.
- Package the “dlnd_image_classification.ipynb”, “helper.py”, “problem_unittests.py”, and the HTML file into a zip archive, or push the files from your GitHub repo.
- Hit Submit Project below!
Submit Your Project
submit
view submission
reference
Career: Interview Practice
Machine Learning Specializations
Capstone Proposal
PROJECT
Writing up a Capstone proposal
Overview
Capstone Proposal Overview
Please note that once your Capstone Proposal has been submitted and you have passed the evaluation, you have to submit your Capstone project using the same proposal that you submitted. We do not allow the Capstone Proposal and the Capstone project to differ in terms of dataset and approach.
In this capstone project proposal, prior to completing the following Capstone Project, you you will leverage what you’ve learned throughout the Nanodegree program to author a proposal for solving a problem of your choice by applying machine learning algorithms and techniques. A project proposal encompasses seven key points:
- The project’s domain background — the field of research where the project is derived;
- A problem statement — a problem being investigated for which a solution will be defined;
- The datasets and inputs — data or inputs being used for the problem;
- A solution statement — a the solution proposed for the problem given;
- A benchmark model — some simple or historical model or result to compare the defined solution to;
- A set of evaluation metrics — functional representations for how the solution can be measured;
- An outline of the project design — how the solution will be developed and results obtained.
Capstone Proposal Highlights
The capstone project proposal is designed to introduce you to writing proposals for major projects. Typically, before you begin working on a solution to a problem, a proposal is written to your peers, advisor, manager, etc., to outline the details of the problem, your research, and your approach to a solution.
Things you will learn by completing this project proposal:
- How to research a real-world problem of interest.
- How to author a technical proposal document.
- How to organize a proposed workflow for designing a solution.
Description
Capstone Proposal Description
Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. Below are a few suggested problem areas you could explore if you are unsure what your passion is:
In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.
To determine whether your project and the problem you want to solve fits Udacity’s vision of a Machine Learning Capstone Project , please refer to the capstone proposal rubric and the capstone project rubric and make a note of each rubric criteria you will be evaluated on. A satisfactory project will have a proposal that clearly satisfies these requirements.
Software and Data Requirements
Software Requirements
Your proposed project must be written in Python 2.7. Given the free-form nature of the machine learning capstone, the software and libraries you will need to successfully complete your work will vary depending on the chosen application area and problem definition. Because of this, it is imperative that all necessary software and libraries you consider using in your capstone project are accessible clearly documented. Please note that proprietary software, software that requires private licenses, or software behind a paywall or login account should be avoided.
Data Requirements
Every machine learning capstone project will most certainly require some form of dataset or input data structure (input text files, images, etc.). Similar to the software requirements above, the data you are considering must either be publicly accessible or provided by you during the submission process, and private or proprietary data should not be used without expressed permission. Please take into consideration the file size of your data — while there is no strict upper limit, input files that are excessively large may require reviewers longer than an acceptable amount of time to acquire all of your project files. This can take away from the reviewer’s time that could be put towards evaluating your proposal. If the data you are considering fits the criteria of being too large, consider whether you could work with a subset of the data instead, or provide a representative sample of the data.
Ethics
Udacity’s A/B Testing course, as part of the Data Analyst Nanodegree, has a segment that discusses the sensitivity of data and the expectation of privacy from those whose information has been collected. While most data you find available to the public will not have any ethical complications, it is extremely important that you are considering where the data you are using came from, and whether that data contains any sensitive information. For example, if you worked for a bank and wanted to use customers’ bank statements as part of your project, this would most likely be an unethical choice of data and should be avoided.
If you have any questions regarding the nature of a dataset or software you intend to use for the capstone project, please send an email to machine-support@udacity.com with the subject “Capstone Project Dataset/Software Inquiry”.
Proposal Guidelines
Report Guidelines
Your project submission will be evaluated on the written proposal that is submitted. Additionally, depending on the project you are proposing, other materials such as the data being used will be evaluated. It is expected that the proposal contains enough detail, documentation, analysis, and discussion to adequately reflect the work you intend to complete for the project. Because of this, it is extremely important that the proposal is written in a professional, standardized way, so those who review your project’s proposal are able to clearly identify each component of your project in the report. Without a properly written proposal, your project cannot be sufficiently evaluated. A project proposal template is provided for you to understand how a project proposal should be structured. We strongly encourage students to have a proposal that is approximately two to three pages in length.
The Machine Learning Capstone Project proposal should be treated no different than a written research paper for academics. Your goal is to ultimately present the research you’ve discovered into the respective problem domain you’ve chosen, and then clearly articulate your intended project to your peers. The narrative found in the project proposal template provides for a “proposal checklist” that will aid you in fully completing a documented proposal. Please make use of this resource!
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Capstone Project Proposal rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip)
, please take into consideration the total file size. You will need to include
- A project proposal, in PDF format only, with the name proposal.pdf, addressing each of the seven key points of a proposal. The recommended page length for a proposal is approximately two to three pages.
- Any additional supporting material such as datasets, images, or input files that are necessary for your project and proposal. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in an included
README.md
file.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
Submission
Capstone Proposal
Project Submission
In this capstone project proposal, prior to completing the following Capstone Project, you you will leverage what you’ve learned throughout the Nanodegree program to author a proposal for solving a problem of your choice by applying machine learning algorithms and techniques. A project proposal encompasses seven key points:
- The project’s domain background — the field of research where the project is derived;
- A problem statement — a problem being investigated for which a solution will be defined;
- The datasets and inputs — data or inputs being used for the problem;
- A solution statement — a the solution proposed for the problem given;
- A benchmark model — some simple or historical model or result to compare the defined solution to;
- A set of evaluation metrics — functional representations for how the solution can be measured;
- An outline of the project design — how the solution will be developed and results obtained.
Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. Below are a few suggested problem areas you could explore if you are unsure what your passion is:
In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.
Evaluation
Your project will be reviewed by a Udacity reviewer against the Capstone Project Proposal rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip)
, please take into consideration the total file size. You will need to include
- A project proposal, in PDF format only, with the name proposal.pdf, addressing each of the seven key points of a proposal. The recommended page length for a proposal is approximately two to three pages.
- Any additional supporting material such as datasets, images, or input files that are necessary for your project and proposal. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in an included
README.md
file.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
I’m Ready!
When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.
What’s Next?
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
Capstone Project
PROJECT
Machine Learning Capstone Project
Overview
Capstone Project Overview
In this capstone project, you will leverage what you’ve learned throughout the Nanodegree program to solve a problem of your choice by applying machine learning algorithms and techniques. You will first define the problem you want to solve and investigate potential solutions and performance metrics. Next, you will analyze the problem through visualizations and data exploration to have a better understanding of what algorithms and features are appropriate for solving it. You will then implement your algorithms and metrics of choice, documenting the preprocessing, refinement, and postprocessing steps along the way. Afterwards, you will collect results about the performance of the models used, visualize significant quantities, and validate/justify these values. Finally, you will construct conclusions about your results, and discuss whether your implementation adequately solves the problem.
Capstone Project Highlights
This project is designed to prepare you for delivering a polished, end-to-end solution report of a real-world problem in a field of interest. When developing new technology, or deriving adaptations of previous technology, properly documenting your process is critical for both validating and replicating your results.
Things you will learn by completing this project:
- How to research and investigate a real-world problem of interest.
- How to accurately apply specific machine learning algorithms and techniques.
- How to properly analyze and visualize your data and results for validity.
- How to document and write a report of your work.
Description
Capstone Description
Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as data sets) to complete this project, and make the appropriate citations wherever necessary in your report. Below are a few suggested problem areas you could explore if you are unsure what your passion is:
In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.
Note: For students who have enrolled before October 17th, we strongly encourage that you look at the Capstone Proposal project that is available as an elective before this project. If you have an idea for your capstone project but aren’t ready to begin working on the implementation, or even if you want to get feedback on how you will approach a solution to your problem, you can use the Capstone Proposal project to have a peer-review from one of our Capstone Project reviewers!
For whichever application area or problem you ultimately investigate, there are five major stages to this capstone project which you will move through and subsequently document. Each stage plays a significant role in the development life cycle of beginning with a problem definition and finishing with a polished, working solution. As you make your way through developing your project, be sure that you are also working on a rough draft of your project report, as it is the most important aspect to your submission!
To determine whether your project and the problem you want to solve fits Udacity’s vision of a Machine Learning Capstone Project , please refer to the capstone project rubric and make a note of each rubric criteria you will be evaluated on. A satisfactory project will have a report that encompasses each stage and component of the rubric.
Software and Data Requirements
Software Requirements
Your project must be written in Python 2.7. Given the free-form nature of the machine learning capstone, the software and libraries you will need to successfully complete your work will vary depending on the chosen application area and problem definition. Because of this, it is imperative that all necessary software and libraries used in your capstone project are accessible to the reviewer and clearly documented. Information regarding the software and libraries your project makes use of should be included in the README
along with your submission. Please note that proprietary software, software that requires private licenses, or software behind a paywall or login account should be avoided.
Data Requirements
Every machine learning capstone project will most certainly require some form of dataset or input data structure (input text files, images, etc.). Similar to the software requirements above, the data you use must either be publicly accessible or provided by you during the submission process, and private or proprietary data should not be used without expressed permission. Please take into consideration the file size of your data — while there is no strict upper limit, input files that are excessively large may require reviewers longer than an acceptable amount of time to acquire all of your project files and/or execute the provided development code. This can take away from the reviewer’s time that could be put towards evaluating your submission. If the data you are working with fits the criteria of being too large, consider whether you can work with a subset of the data instead, or provide a representative sample of the data which the reviewer may use to verify the solution explored in the project.
Ethics
Udacity’s A/B Testing course, as part of the Data Analyst Nanodegree, has a segment that discusses the sensitivity of data and the expectation of privacy from those whose information has been collected. While most data you find available to the public will not have any ethical complications, it is extremely important that you are considering where the data you are using came from, and whether that data contains any sensitive information. For example, if you worked for a bank and wanted to use customers’ bank statements as part of your project, this would most likely be an unethical choice of data and should be avoided.
Report Guidelines
Report Guidelines
Your project submission will be evaluated primarily on the report that is submitted. It is expected that the project report contains enough detail, documentation, analysis, and discussion to adequately reflect the work you completed for your project. Because of this, it is extremely important that the report is written in a professional, standardized way, so those who review your project submission are able to clearly identify each component of your project in the report. Without a properly written report, your project cannot be sufficiently evaluated. A project report template is provided for you to understand how a project report should be structured. We strongly encourage students to have a report that is approximately nine to fifteen pages in length.
The Machine Learning Capstone Project report should be treated no different than a written research paper for academics. Your goal is to ultimately present the research you’ve discovered into the respective problem domain you’ve chosen, and then discuss each stage of the project as they are completed. The narrative found in the A project report template provides for a “report checklist” that will aid you in staying on track for both your project and the documentation in your report. Each stage can be found as a section that will guide you through each component of the project development life cycle. Please make use of this resource!
Example Reports
Example Machine Learning Capstone Reports
Included in the project files for the Capstone are three example reports that were written by students just like yourselves. Because the written report for your project will be how you are evaluated, it is absolutely critical that you are producing a clear, detailed, well-written report that adequately reflects the work that you’ve completed for your Capstone. Following along with the Capstone Guidelines will be very helpful as you begin writing your report.
Our first example report comes from graduate Martin Bede, whose project design in the field of computer vision, named “Second Sight”, was to create an Android application that would extract text from the device’s camera and read it aloud. Martin’s project cites the growing concern of vision loss as motivation for developing software that can aid those unable to see or read certain print.
Our second example report comes from an anonymous graduate whose project design in the field of image recognition was to implement a Convolutional Neural Network (CNN) to train on the Cifar-10 dataset and successfully identify different objects in new images. This student describes with thorough detail how a CNN can be used quite effectively as a descriptor-learning image recognition algorithm.
Our third example report comes from graduate Naoki Shibuya, who took advantage of the pre-curated robot motion planning “Plot and Navigate a Virtual Maze” project. Pay special attention to the emphasis Naoki places on discussing the methodology and results: Projects relying on technical implementations require valuable observations and visualizations of how the solution performs under various circumstances and constraints.
Each example report given has many desirable qualities we expect from students when completing the Machine Learning Capstone project. Once you begin writing your project report for which ever problem domain you choose, be sure to reference these examples whenever necessary!
Submitting the Project
Evaluation
Your project will be reviewed by a Udacity reviewer against the Machine Learning Capstone project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip
), please take into consideration the total file size. You will need to include
- Your capstone proposal document as proposal.pdf if you have completed the pre-requisite Capstone Proposal project. Please also include your review link in the student submission notes.
- A project report (in PDF format only) addressing the five major project development stages. The recommended page length for a project report is approximately nine to fifteen pages. Please do not export an iPython Notebook as PDF for your project report.
- All development Python code used for your project that is required to reproduce your implemented solution and result. Your code should be in a neat and well-documented format. Using iPython Notebooks is strongly encouraged for development.
- A
README
documentation file which briefly describes the software and libraries used in your project, including any necessary references to supporting material. If your project requires setup/startup, ensure that yourREADME
includes the necessary instructions. - Any additional supporting material such as datasets, images, or input files that are necessary for your project’s development and implementation. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in your included
README
.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
Submission
Capstone Project
Project Submission
In this capstone project, you will leverage what you’ve learned throughout the Nanodegree program to solve a problem of your choice by applying machine learning algorithms and techniques. You will first define the problem you want to solve and investigate potential solutions and performance metrics. Next, you will analyze the problem through visualizations and data exploration to have a better understanding of what algorithms and features are appropriate for solving it. You will then implement your algorithms and metrics of choice, documenting the preprocessing, refinement, and postprocessing steps along the way. Afterwards, you will collect results about the performance of the models used, visualize significant quantities, and validate/justify these values. Finally, you will construct conclusions about your results, and discuss whether your implementation adequately solves the problem.
Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. Below are a few suggested problem areas you could explore if you are unsure what your passion is:
In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms.
Note: For students who have enrolled before October 17th, we strongly encourage that you look at the Capstone Proposal project that is available as an elective before this project. If you have an idea for your capstone project but aren’t ready to begin working on the implementation, or even if you want to get feedback on how you will approach a solution to your problem, you can use the Capstone Proposal project to have a peer-review from one of our Capstone Project reviewers!
For whichever application area or problem you ultimately investigate, there are five major stages to this capstone project which you will move through and subsequently document. Each stage plays a significant role in the development life cycle of beginning with a problem definition and finishing with a polished, working solution. As you make your way through developing your project, be sure that you are also working on a rough draft of your project report, as it is the most important aspect to your submission!
Evaluation
Your project will be reviewed by a Udacity reviewer against the Machine Learning Capstone project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
Submission Files
At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive (*.zip
), please take into consideration the total file size. You will need to include
- Your capstone proposal document as proposal.pdf if you have completed the pre-requisite Capstone Proposal project. Please also include your review link in the student submission notes.
- A project report (in PDF format only) addressing the five major project development stages. The recommended page length for a project report is approximately nine to fifteen pages. Please do not export an iPython Notebook as PDF for your project report.
- All development Python code used for your project that is required to reproduce your implemented solution and result. Your code should be in a neat and well-documented format. Using iPython Notebooks is strongly encouraged for development.
- A
README
documentation file which briefly describes the software and libraries used in your project, including any necessary references to supporting material. If your project requires setup/startup, ensure that yourREADME
includes the necessary instructions. - Any additional supporting material such as datasets, images, or input files that are necessary for your project’s development and implementation. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in your included
README
.
I’m Ready!
When you’re ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at machine-support@udacity.com or visit us in the discussion forums.
What’s Next?
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!
Supporting Materials
Videos Zip File
THE MNIST DATABASE of handwritten digits
Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks
selfdrivingcars
(2,2),25,500,128,200,success
Testing Accuracy: 0.8081980186934564
First result
1 convnet
1 fully_con
save as capstone_model.meta
Testing Accuracy: 0.808136261212445
Second result
3 convnets
1 fully_con
Testing Accuracy: 0.8427798201640447
with fully datasets
Testing Accuracy: 0.8851721937559089